“How can you use open data to explore history?” NEH recently hosted a contest asking this question. The Chronicling America Historic American Newspapers Data Challenge invited members of the public to create web-based projects using the historic newspaper data in Chronicling America. Visitors view around 40 million web pages in Chronicling America each year, and some also use the open API to explore the historic newspapers as big data. This series will feature guest posts from the winning data challenge projects, which explore important humanities themes and technology including visualizations, maps, tools, and data mashups.
First up: Historical Agricultural News, a site built by Amy Giroux and Marcy Galbreath of the University of Central Florida and Nathan Giroux, a software engineer in the military simulation industry.
Headquartered in Orlando at the University of Central Florida, Historical Agricultural News placed 2nd nationally in the 2016 NEH Data Challenge, which asked for novel ways of using the Chronicling America database. Historical Agricultural News employs a unique algorithm that locates newspaper articles about agricultural organizations and production found in Chronicling America. Based on a concept of newspapers as the social media of their day, Historical Agricultural News helps illustrate the role farming organizations such as the land-grant colleges, Farm Bureau, and Boys and Girls Clubs played in distributing the ideas of modern farming in 19th- and early 20th- century America, evidence of which is displayed in the newspaper articles of the time period.
The NEH Data Challenge recognizes the constraints of big data—the overwhelming amount of information made possible by contemporary digital technologies. Historical Agricultural News responds to the opportunities and challenges of big data by selecting narratives defined through an organization-specific checklist plus key terms, dates, and location, and produces results that can be searched on the screen, downloaded as a comma-delimited file editable in Excel, or rendered as a heat map visualization. Humanities researchers can utilize the data thus gathered as gateways into understanding farming communities as they transitioned through shifting ideas of progress, development, and modernity, and through changing demographic and social factors such as immigration, politics, and farming practices. Historical Agricultural News will also be of interest to those interested in community and family histories related to agricultural news.
The algorithm that is key to this search tool enables an article-level result rather than a page-level result. By pre-processing the Chronicling America bulk OCR data files and selecting the page-level files containing key agricultural organization terms, the algorithm seeks out each agricultural article’s beginning and end, and pulls the specific article text into the Historical Agricultural News database. This article selection process uses the category of organizations to confine the initial search to agriculture-related entities, and provides additional key term categories in crops, livestock and dairy to further streamline the search. By offering this user interface, the search engine weeds out random results (such as advertisements) that might otherwise obstruct a research query.
Chronicling America is an open access, searchable database of historic U.S. newspapers produced by a long-term partnership between NEH and the Library of Congress. It includes millions of pages of digitized newspapers and descriptive information contributed by states and territories across the country. The Library of Congress provides the data through a well-documented API to enable exploration of the collection in a variety of ways beyond the site’s popular web interface. This openness allows users both to view individual pages and download big data sets used to show trends over time and space.