The National Endowment for the Humanities (NEH) and seven global partners today awarded approximately $4.8 million to international research teams investigating how computational techniques may be applied to “big data” —the massive multi-source datasets made possible by modern technology.
Fourteen teams representing Canada, the Netherlands, the United Kingdom, and the United States were named the winners of the second Digging Into Data Challenge, a competition to promote innovative humanities and social science research using large-scale data analysis. Each team represents collaborations among scholars, scientists, and librarians from leading universities worldwide.
Arising out of the question “what would you do with a million books?” the Digging Into Data Challenge grant competition was created in 2009 by NEH and three other international research agencies. This year, an expanded group of funders will support fourteen projects that apply “cyberscholarship” to a wide variety of topics, such as: tracking the spread and severity of the flu pandemic of 1918 as reported in the newspapers of the day; using medical imaging scanning on mummies to see if ancient Egyptians died of hardening of the arteries; tracing the evolution of Western musical style over 600 years through analysis of a vast repository of music from 1300-1900; and mining 19th and 20th century census data to determine how migration affected individuals’ economic opportunity and social mobility in Europe and North America.
The sponsoring research funders include the Arts & Humanities Research Council (United Kingdom), the Economic & Social Research Council (United Kingdom), the Institute of Museum and Library Services (United States), the Joint Information Systems Committee (United Kingdom), the National Endowment for the Humanities (United States), the National Science Foundation (United States), the Netherlands Organisation for Scientific Research (Netherlands), and the Social Sciences and Humanities Research Council of Canada (Canada).
Of the approximately $4.8 million U.S. dollars provided by these eight agencies, the National Endowment for the Humanities’ contribution of $633,217 supports American researchers from five of the fourteen teams.
Detailed descriptions of the fourteen winning projects can be found below.
Additional information about the competition can be found at www.diggingintodata.org
# # #
Digging into Data Challenge
Round Two (2011) Winners
Cascades, Islands, or Streams? Time, Topic, and Scholarly Activities in Humanities and Social Science Research
(Principal Investigators: Cassidy R. Sugimoto, Ying Ding, Staša Milojević, Indiana University, Bloomington, NSF; Mike Thelwall, University of Wolverhampton, AHRC/ESRC/JISC; Vincent Larivière, Université de Montréal, SSHRC.)
This project will examine topic life cycles across heterogeneous corpora, including not only scholarly and scientific literature, but also social networks, blogs, and other materials. While the growth of large-scale datasets has enabled examination within scientific datasets, there is little research that looks across datasets. The team will analyze the importance of various scholarly activities for creating, sustaining, and propelling new knowledge; compare and triangulate the results of topic analysis methods; and develop transparent and accessible tools. This work should identify which scholarly activities are indicative of emerging areas and identify datasets that should no longer be marginalized, but built into understandings and measurements of scholarship.
(Principal Investigators: Robert C. Stacey, University of Washington, IMLS; Arno Knobbe, Leiden University, NWO; Sarah Rees Jones, University of York, AHRC/ESRC/JISC; Michael Gervers, University of Toronto, SSHRC. Additional participating institutions: University of Brighton, Columbia University.)
This project will develop new ways of exploring the full text content of digital historical records. The project will demonstrate its approach using medieval charters which survive in abundance from the 12th to the 16th centuries and are one of the richest sources for studying the lives of people in the past.
Digging into Connected Repositories (DiggiCORE)
(Principal Investigators: Andreas Juffinger, The European Library Office, NWO; Zdenek Zdrahal, The Open University, AHRC/ESRC/JISC.)
This project will analyze a vast set of Open Access research publications using Natural Language Processing and social network analysis methods to identify patterns in the behavior of research communities, to recognize trends in research disciplines, to learn new insights about the citation behaviors of researchers and to discover features that distinguish papers with high impact. This will enable the development of better methods for exploratory search and browsing in digital collections or new ways of evaluating research or the researcher’s impact.
Digging by Debating
(Principal Investigators: Colin Allen and Katy Börner, Indiana University, Bloomington, NEH; Andrew Ravenscroft, University of East London, Chris Reed, University of Dundee, and David Bourget, University of London, AHRC/ESRC/JISC.)
A project to develop and implement a multi-scale workbench, called "InterDebates", with the goal of digging into data provided by hundreds of thousands, eventually millions, of digitized books, bibliographic databases of journal articles, and comprehensive reference works written by experts. The team’s hypotheses are: that detailed and identifiable arguments drive many aspects of research in the sciences and the humanities; that argumentative structures can be extracted from large datasets using a mixture of automated and social computing techniques; and, that the availability of such analyses will enable innovative interdisciplinary research, and may also play a role in supporting better-informed critical debates among students and the general public.
Digging into Human Rights Violations: Anaphora Resolution and Emergent Witnesses
(Principal Investigators: Ben Miller, Georgia State University, NSF; Lu Xiao, University of Western Ontario, SSHRC. Additional participating institutions: University of North Florida.)
This project will develop an automated reader for large text archives of human rights abuses that will reconstruct stories from fragments scattered across a collection, and an interface for navigating those stories. By improving on anaphora resolution techniques in Natural Language Processing for the connection of pronouns to specific nouns, this system will help researchers and courts reveal witnesses and patterns contained in their own collections.
Digging into Metadata: Enhancing Social Science and Humanities Research
(Principal Investigators: Mick Khoo, Drexel University, IMLS; Diana Massam, University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: University of Glamorgan.)
The project will automatically generate new forms of metadata tags from existing metadata records and associated resources that will support discovery across multiple repositories. The project will utilize four repositories that vary in size, domain, metadata creation method and workflow, and quality. PERTAINS, a tool developed by one of the partner schools, will be used to analyze the metadata records in each repository and then to generate Dewey Decimal Classification-based tags. Clustering algorithms will be used to generate an index of similarity and match between resources in different repositories. After conducting a search, the user will retrieve a list of resources from the different collections that have been tagged in similar ways. Visualization techniques will be used to display the results in ways that enhance the research process.
Electronic Locator of Vertical Interval Successions (ELVIS): The First Large Data-Driven Research Project on Musical Style
(Principal Investigators: Michael Scott Cuthbert, Massachusetts Institute of Technology, NEH; Frauke Jürgensen, University of Aberdeen, AHRC/ESRC/JISC; Julie E. Cumming, McGill University, SSHRC. Additional participating institutions: Yale University.)
A project to study changes in Western musical style from 1300 to 1900, using the digitized collections of several large music repositories. The team notes that in order to understand style change in Western polyphonic music we need to be able to describe acceptable vertical sonorities (chords) and melodic motions in each period, and how they change over time. The project aims to do this for European polyphony from 1300 to 1900, using advanced music information retrieval techniques to study highly contrasting kinds of music that are nevertheless unified by common concepts of tonality, consonance vs. dissonance, and voice leading.
An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic
(Edward T. Ewing, Bernice L. Hausman, Bruce Pencek, and Narendran Ramakrishnan, Virginia Polytechnic Institute & State University, NEH; Gunther Eysenbach, University of Toronto, SSHRC.)
This project seeks to harness the power of data mining techniques with the interpretive analytics of the humanities and social sciences to understand how newspapers shaped public opinion and represented authoritative knowledge during this deadly pandemic. This project makes use of the more than 100 newspaper titles for 1918 available from Chronicling America at the United States Library of Congress and the Peel’s Prairie Provinces collection at the University of Alberta Library. The application of algorithmic techniques enables the domain expert to systematically explore a broad repository of data and identify qualitative features of the pandemic in the small scale as well as the genealogy of information flow in the large scale. This research can provide methods for understanding the spread of information and the flow of disease in other societies facing the threat of pandemics.
Imagery Lenses for Visualizing Text Corpora
(Principal Investigators: Katharine Coles, University of Utah, NEH; Min Chen, University of Oxford, AHRC/ESRC/JISC.)
A project to explore new visualization techniques for use in large scale linguistic and literary corpora using the collections of the British National Corpus and various smaller archives of poetry. The team will investigate whether or not advanced visualization techniques can provide an interface that enables humanities researchers to use their domain knowledge dynamically, while using the computational capability of computers. In particular, can data visualization help users make new observations and generate new hypotheses? The aim of this project is to answer the above methodological research question, and to create a set of new visualization tools for future scholarly research.
IMPACT Radiological Mummy Database
(Principal Investigators: Randall Thompson, Saint Luke’s Mid America Heart Institute, NEH; Andrew Nelson, University of Western Ontario, SSHRC. Additional participating institutions: Al Azhar Medical School, Cairo, Quinnipiac University, Canadian Museum of Civilization, University of Southern California, University of California, San Diego, Mount Sinai School of Medicine, South Coast Radiological Medical Group, Newport Diagnostic Center, University of California, Irvine, Wisconsin Heart Hospital.)
This project is designed to provide mummy and medical researchers with a large-scale comparative database of medical imaging of mummified human remains. This departure from a case-study model for mummy studies will drive the field towards a large-scale comparative and epidemiological paradigm. The Canadian team will be investigating the evisceration and excerebration components of the Egyptian mummification tradition, and the US teams will apply the database to a greatly expanded study of atherosclerosis in ancient Egyptian mummies, as part of the IMPACT Ancient Health Research Group, and to the refinement of a novel system of diagnosis by consensus for mummified remains.
Integrated Social History Environment for Research (ISHER)-Digging into Social Unrest
(Principal Investigators: Dan Roth, University of Illinois, Urbana-Champaign, NSF; Antal van den Bosch, Tilburg University, NWO; Sophia Ananiadou, The University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: International Institute of Social History.)
This project will develop an integrated environment using sophisticated text mining tools to facilitate knowledge discovery in social history research. It will provide social historians and social scientists with the means to detect and associate events, trends, people, organizations, and other entities of specific interest to social historians.
Integrating Data Mining and Data Management Technologies for Scholarly Inquiry
(Principal Investigators: Ray R. Larson, University of California, Berkeley and Richard Marciano, University of North Carolina at Chapel Hill, IMLS; Paul B. Watry, University of Liverpool, AHRC/ESRC/JISC. Additional participating institutions: Internet Archive, JSTOR.)
This project will integrate large-scale collections including JSTOR and the books collections of the Internet Archive stored and managed in a distributed preservation environment. It will also incorporate text mining and Natural Language Processing software capable of generating dynamic links to related resources discussing the same persons, places, and events. In this 17-month project we go beyond basic analysis by providing a prototype system developed to provide expert system support to scholars in their work.
Mining Microdata: Economic Opportunity and Spatial Mobility in Britain, Canada and the United States, 1850-1911
(Principal Investigators: Evan Roberts, University of Minnesota, NSF; Kevin Schürer, University of Leicester, AHRC/ESRC/JISC; Kris E. Inwood, University of Guelph, SSHRC. Additional participating institutions: University of Alberta, Université de Montréal, University of Essex.)
This project will make use of novel data-mining technology to exploit one of the largest population databases in the world, a vast collection of harmonized 19th and early 20th century census micro data from Britain, Canada, and the United States originally digitized for genealogical research. The goal is to shed light on the impact of economic opportunity and spatial mobility on social structure in Europe and North America.
(Principal Investigators: Ewan Klein, University of Edinburgh, AHRC/ESRC/JISC; Colin M. Coates, York University, SSHRC. Additional participating institutions: University of St Andrews.)
This project will examine the economic and environmental consequences of commodity trading during the nineteenth century. The project team will be using information extraction techniques to study large corpora of digitized documents from the nineteenth century. This innovative digital resource will allow historians to discover novel patterns and to explore new hypotheses, both through structured query and through a variety of visualization tools.
# # #
About the National Endowment for the Humanities
Created in 1965 as an independent federal agency, the National Endowment for the Humanities supports research and learning in history, literature, philosophy, and other areas of the humanities by funding selected, peer-reviewed proposals from around the nation. Additional information about the National Endowment for the Humanities and its grant programs is available at: www.neh.gov.
The Arts & Humanities Research Council (AHRC): Each year the AHRC provides approximately £112 million from the Government to support research and postgraduate study in the arts and humanities, from languages and law, archaeology and English literature to design and creative and performing arts. In any one year, the AHRC makes approximately 700 research awards and around 1,300 postgraduate awards. Awards are made after a rigorous peer review process, to ensure that only applications of the highest quality are funded. The quality and range of research supported by this investment of public funds not only provides social and cultural benefits but also contributes to the economic success of the UK.
The Economic and Social Research Council (ESRC) is the UK’s largest organization for funding research on economic and social issues. It supports independent, high quality research which has an impact on business, the public sector and the third sector. The ESRC’s total budget for 2011/12 is £203 million. At any one time the ESRC supports over 4,000 researchers and postgraduate students in academic institutions and independent research institutes. More at www.esrc.ac.uk
The Institute of Museum and Library Services (IMLS) is the primary source of federal support for the nation’s 123,000 libraries and 17,500 museums. The Institute’s mission is to create strong libraries and museums that connect people to information and ideas. The Institute works at the national level and in coordination with state and local organizations to sustain heritage, culture, and knowledge; enhance learning and innovation; and support professional development. To learn more about the Institute, please visit www.imls.gov.
The Joint Information Systems Committee (JISC) is a joint committee of the U.K. further and higher education funding bodies and is responsible for supporting the innovative use of information and communication technology (ICT) to support learning, teaching, and research. It is best known for providing a U.K. national infrastructure network, a range of support, content, and advisory services, and a portfolio of high-quality resources. Information about JISC, its services, and programs can be found at www.jisc.ac.uk.
The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. In fiscal year (FY) 2009, its budget is $9.5 billion, which includes $3.0 billion provided through the American Recovery and Reinvestment Act. NSF funds reach all 50 states through grants to over 1,900 universities and institutions. Each year, NSF receives about 44,400 competitive requests for funding, and makes over 11,500 new funding awards. NSF also awards over $400 million in professional and service contracts yearly. More information about NSF is available on the Internet at www.nsf.gov/.
The Netherlands Organisation for Scientific Research (NWO) funds thousands of top researchers at universities and institutes and steers the course of Dutch science by means of subsidies and research programs.
The Social Sciences and Humanities Research Council (SSHRC) is an independent federal government agency that funds university-based research and graduate training through national peer-review competitions. SSHRC also partners with public and private sector organizations to focus research and aid the development of better policies and practices in key areas of Canada’s social, cultural and economic life. More information about SSHRC is available on the Internet at www.sshrc-crsh.gc.ca/.
NEH Office of Communications, (202) 606-8446.