Reimagining Searching in Chronicling America

July 17, 2020
Capture of Civil War maps from ChronAm
Photo caption

Capture of Civil War maps from ChronAm identified by the Newspaper Navigator algorithms.

Courtesy of Innovator-in-Residence Ben Lee

Over the past 15 years, Chronicling America (ChronAm) has made more than 16 million pages accessible to visitors from around the world. Grants made through the National Digital Newspaper Program (NDNP) enable the selection of digitized titles in ChronAm as part of a collaboration between NEH’s Division of Preservation and Access and the Library of Congress (LC). Institutions from 48 states and territories have contributed newspapers to this digital database which now spans from 1690 to 1963. From the long runs of the newspapers of record in large metropolitan cities to the often-short-lived ethnic publications, ChronAm provides access to accounts of the most important events of the nation as well as the daily lives of its people and communities. Throughout this growing collection, you can find everything from the landmark passage of the nineteenth amendment or the centennial anniversary of American independence to an advertisement for ice or the names of the oldest living couple in the US in 1876.

New York Tribune and The Daily Phoenix
Photo caption

On the left: August 19th, 1920, the New York Tribune on the passage of the 19th amendment giving women the right to vote. On the right: The Daily Phoenix, July 4th, 1876 commemorating the centennial anniversary of US independence.

Making all of this information available to a public as broad as America itself, however, has some important challenges. First, it can be hard to know where to start if you are not, for example, a researcher, graduate student or a genealogist looking for something specific, but rather the citizen scholar, local history buff, or grade-school student wanting to explore and see what catches your eye. To help new and even experienced visitors, the staff at LC has prepared recommended topics with brief introductions and selections of articles that touch on those topics. You can also browse by state, date, newspaper name, or enter a search term and see what comes up. All of these are great options if you are looking for articles, but what if you are interested in advertisements, maps, photos, or illustrations? The visual elements of newspapers may seem secondary to their stories, yet they hold an incredible amount of information that is difficult to describe and even harder to search for in the billions of words in ChronAm.

Beyond Words

In 2017, LC Labs launched a pilot project called Beyond Words that focused precisely on these kinds of images from ChronAm and asked volunteers to identify and describe selections related to World War I. Tong Wang developed the application and, over the course of two years, participants annotated more than 40,000 photos, illustrations, maps, cartoons, and comics. Beyond simply contributing transcriptions, the thousands of volunteers enriched our understanding of US historical newspapers by providing their perspectives on these materials. These contributions power a search feature in Chron Am that helps others find more images. The public domain data created by volunteers have also been repurposed in other creative projects that reimagine how to search and read newspapers in their digital form.

The soon-to-be-launched Newspaper Navigator, developed by Innovator-in-Residence Ben Lee, builds on the success of Beyond Words to provide another point of entry to ChronAm via image searches. In this project, Ben applies machine learning techniques to the transcriptions made by the volunteers to automatically identify images. The crowdsourced annotations group the images into seven classes or categories: photograph, illustration, map, comic or cartoon, editorial cartoon, headline, and advertisement.

Since the contributors had inspected and described the images using these categories, the program could use this curated information to make educated guesses about where unidentified images might be found in the millions of pages in ChronAm and what content they might have without needing individual users to locate them. In the second phase of the project, Newspaper Navigator will provide a search feature in which users can either type keywords as they did before or select an image as their search term and retrieve images from ChronAm as the results. Just as exciting as the technology behind the tool is the fact that the users themselves contributed directly to enhancing access for the public.

Once it is launched, users will be able to see more easily the evolution of graphic content, researching as they follow fashion trends over six decades, retrace military campaigns through Europe and North America, or rediscover the greatest players of America’s favorite pastime.

Newspaper Navigator launches later this summer, but in the meantime you can learn more about it in this white paper or start exploring the 100 million images and all of the data used in the project.