JSTOR’s Arabic-Language Digitization Project

June 28, 2017
American University of Beirut exterior
Photo caption

American University of Beirut. Al-Abhath, JSTOR’s pilot journal for the digitization project, is a quarterly publication of the university.

Image courtesy of Wikimedia Commons.

As any student will tell you, it is hard to imagine a world without online access to scholarly material. A few strokes at the keyboard is usually enough to conjure up a wealth of material on your computer screen, all the better if you have access to a digital library subscription. Forget which journal article mentioned that text you meant to look up later? A quick keyword search usually takes care of the problem. We have become so accustomed to the gains of digitization that we likely take its remarkability for granted. But ask anyone who was a student only a few decades ago, and you’ll probably receive a variation of the following: there was a time when scholarly information wasn’t so easily and widely accessible.

In many ways, and for a variety of complex reasons, this glimpse of the past is still the reality throughout most of the Middle East and North Africa. To date, there is no significant digital Arabic-language database. There are have been efforts to digitize Arabic-language journals in the past, but maximal online discovery of their contents remains elusive. This is partly because of limitations in software for optical character recognition (OCR) when it comes to languages that use a non-Roman alphabet. Responsible for facilitating full-text searching and crawling by search engines, OCR is crucial to any digitization effort. But complex political and social realities throughout much of the region pose yet another serious problem for both the preservation of and access to Arabic-language scholarship: limited financial and operational support for the preservation and distribution of journals, as well as general social unrest, make for a particularly challenging environment for digitization efforts to be launched and sustained. This is especially troubling as these conditions could translate into the permanent loss of critical works by influential Arab intellectuals that still remain available only in print form. 

black and white photo of Cairo
Photo caption

Antonio Beato, photographer. Late 19th c. photograph of Cairo. According to a JSTOR report, the foundations of Arab nationhood are preserved in journals like Al-

Image courtesy of Library of Congress.

JSTOR, the leading global not-for-profit digital library that includes the complete archives of more than 2,300 scholarly journals spanning 70 academic disciplines in the humanities, social sciences, and sciences, seeks to learn how to tackle these challenges. With support from National Endowment for the Humanities’ Division of Preservation and Access, JSTOR will undertake an investigative project aimed at addressing the special challenges facing the preservation of Arabic-language texts. Among the first scholarly-oriented services to begin scanning journals with the use of OCR software, JSTOR has crucial experience digitizing and making accessible content written in a non-Roman-alphabet: in 2008, in partnership with the National Library of Israel and the University of Haifa Library, JSTOR executed a major digitization project involving 45 Hebrew-language scholarly journals in the humanities and social sciences, which included building functionality to enable right-to-left navigation. JSTOR is also no stranger to digitally preserving at-risk materials. It is home to thousands of rare and unique materials digitized in partnership with national archives and museums in Africa, including 15th-century manuscripts from Timbuktu.

A 2017 Humanities Collections and Reference Resources grant will support JSTOR’s planning for the digitization of Arabic-language scholarly journals, including a small test run of issues of the journal Al-Abhath (“Research”). A quarterly publication of the American University of Beirut since 1948, Al-Abhath captures crucial periods in the history of the modern Middle East. According to a JSTOR report: “The foundations of Arab nationhood are preserved in these journals,” which can “give researchers perspectives that could not be acquired from any other source.” Using Al-Abhath as their pilot journal, JSTOR will assess the costs and benefits involved in large-scale digitization of Arabic publications. It plans eventually to make other significant at-risk journals more accessible as well.

photograph of a man on white background
Photo caption

John Kiplinger, JSTOR Director of Production and a librarian, has spent decades digitally transforming print collections for use on the web.

Image courtesy of ITHAKA

“JSTOR is honored to have been selected as an NEH grant recipient,” says John Kiplinger, JSTOR’s Director of Production. “We are eager to bring together our expertise in collection development and large scale digitization, with the expertise of scientists and academics working on technical challenges specific to Arabic, such as OCR, in hopes we will be able to accelerate efforts to bring Arabic language journals online, particularly those that may be in the most danger of being lost from a preservation standpoint.”

Those involved in the project anticipate that future library and publisher digitization efforts will directly benefit from their findings, which will be published in a freely available white paper. The larger hope, however, is more ambitious: making more accessible a corpus of Arabic-language scholarly material will not only provide contemporary scholars in the Arab world and elsewhere with crucial sources about the history of the modern Middle East and North Africa, but it will also help guard against the permanent loss of humanistic knowledge in areas of the world where such materials still remain vulnerable.

Funding information

Ithaka Harbors, Inc. received NEH support through Humanities Collections and Reference Resources, PW-253861-17.