Blog

NEH Invites Proposals that Respond to Historical and Multilingual OCR Report

February 7, 2019
Playbills

The Office of Digital Humanities (ODH) is excited to announce the publication of an important new report titled "A Research Agenda for Historical and Multilingual Optical Character Recognition." The report, funded by The Andrew W. Mellon Foundation and authored by David Smith and Ryan Cordell of Northeastern University, outlines a set of 9 recommendations to improve historical and multilingual OCR. The full report may be found online here: https://ocr.northeastern.edu/report/.

The idea for this report came about several years ago when staff in ODH noticed that a large number of ODH-funded projects working with textual materials were stymied or slowed by poor-quality OCR. This observation led to discussions with grantees and with staff at both the Mellon Foundation and the Library of Congress. Because Mellon staff were already exploring ways to improve the OCR of digitized texts in Arabic and other connected scripts, and LC was seeking greater accuracy in the OCR of its large digitized collection of historical newspapers, we all agreed that a report was needed assessing the state of the art in OCR and identifying key research tasks that might help advance the quality of OCR for a variety of textual materials.

The report is the culmination of about two years of research, surveys, conversations, and in-depth interviews with scholars who work on OCR and rely on OCR'd texts to do their work, with computer and information scientists working toward improving OCR, with librarians who manage digital collections, and with funders who support projects that use and refine OCR methods. The recommendations in the report range from developing methods for improving statistical analysis of OCR output to exploiting existing digital editions for training and test data to convening OCR institutes in critical research areas.

We in ODH invite scholars to consider tackling one or more of these recommendations through our standing grant programs: Digital Humanities Advancement Grants and Institutes for Advanced Topics in the Digital Humanities. In addition, OCR-related projects might be a great fit for the Division of Preservation and Access’ Research and Development program. Links to these programs are below along with next deadline dates. If you have questions about the fit of your proposed project to our grant programs, please do be in touch with us at odh@neh.gov.

Grants Deadlines
Digital Humanities Advancement Grants

June 19, 2019

Institutes for Advanced Topics in the Digital Humanities March 26, 2019
Research and Development

May 15, 2019

Northeastern University press release: https://ocr.northeastern.edu/