New Languages for NLP: Building Linguistic Diversity in the Digital Humanities


Princeton University, Princeton, NJ



Do you wish you could do large-scale text analysis on the languages you study? Is the lack of good linguistic data and tools a barrier to your research?  

Learn how to create the data and language models you need for digital humanities analysis at “New Languages for NLP: Building Linguistic Diversity in the Digital Humanities,” a National Endowment for Humanities Institute for Advanced Topics in the Digital Humanities.  

Held at the Center for Digital Humanities at Princeton, this Institute is a collaboration with Haverford College, the Library of Congress Labs, and DARIAH, the European Digital Research Infrastructure for the Arts and Humanities.  

Participants will work over the course of a year—between June 2021 and May 2022— and will meet for three intensive workshops where they will learn how to annotate linguistic data and train statistical language models using cutting-edge natural language processing (NLP) tools. They will learn best practices in project and research data management. They will join discussions with leaders in the fields of multilingual NLP and DH. They will advance their own research projects by creating, employing and interrogating text-analysis tools and methods, while increasing much-needed linguistic diversity in the field of NLP. 

Funding Information: Details about the Grant 

Project Director(s)

Ermolaev, Natalia and Andrew Janco