Finding the Words

New technology and new thinking are helping Native American communities revitalize their languages

HUMANITIES, Fall 2019, Volume 40, Number 4

In a windowless lab at the University of California, Berkeley’s Moffitt Library, Stephanie Battle is looking at her computer, shuffling through magnified images of physical and chemical decay taking place on the surface of wax cylinders. On a workbench next to her sits a piece of sleek equipment: the groundbreaking microscopic camera known as IRENE, which has been used to digitize roughly 3,000 wax-cylinder recordings of Native American songs and narratives from the early twentieth century. Mounted on it are two bright green plastic disks.

“I’m just showing some of my egregious examples,” says Battle, the effort’s digital-imaging specialist since 2017. The cylinder she’s looking at right now contains mold damage. The traditional, mechanical playback method using a stylus expresses the damage as crunching noises, and simply wiping off the surface will not clear up the sound.

“Even if you could clean off the cruddy parts,” explains Carl Haber, Berkeley physicist and the creator of IRENE, “you can’t make the stuff come back—stuff that’s been eaten.”

The wax cylinder looks like it’s got an uneven, whitish patch over regular, vertical lines of black and gray. “It’s almost like nebula in space,” Haber remarks.

“Beautiful, but also sad,” Battle adds.

Thomas Edison’s phonograph became commercially available by the late 1880s, answering an ethno-linguistic need that had only recently come to the fore. UC Berkeley’s collection of 2,713 wax-cylinder recordings was put together largely by the luminary anthropologist Alfred L. Kroeber, whose appointment by Phoebe A. Hearst as an ethnological and archaeological investigator in California in 1901 led to the formation of a department and a museum of anthropology within the University of California. Kroeber spent his first year almost entirely in the field, collecting linguistic and ethnological specimens. In a report to the department’s advisory committee, he argued for “an immediate systematic survey, however rapid, of the entire state of California, more especially for language.” In the four decades to follow, Kroeber and other anthropologists and linguists (including Thomas T. Waterman, Pliny Earle Goddard, and Edward Sapir) interviewed Native Americans and recorded their songs and stories, leaving behind photographs, sound recordings, manuscripts, and other documents.

Photo caption

Upon his arrival at the University of California, Berkeley, in 1901, pioneering anthropologist Alfred L. Kroeber spent nearly a year in the field recording interviews, songs, and stories of Native Americans. 

—UC Berkeley Bancroft Library / A. L. Kroeber family photographs

The wax cylinders, most of which were made between 1900 and 1940, have been kept at the Hearst Museum of Anthropology. Many have scratches, cracks, mold, dust particles, or cotton-wool fiber stuck to the surface. In 1975, with the help of an NEH grant, the museum duplicated cylinder recordings onto tape using the traditional playback method, leaving, unfortunately, a notable amount of surface noise. What’s worse, the original wax recordings could not be listened to again. Because of the malleable and delicate nature of soft wax, each playback using a mechanical stylus risked further damage to the cylinder. This is where IRENE came into the picture.

In a now well-known account, Haber was stuck in Oakland, California, traffic when he heard on the radio Mickey Hart of the Grateful Dead lamenting the condition and fate of early sound recordings—this was in 2000.

Haber and his colleagues at Lawrence Berkeley National Laboratory had since the late 1980s been working on a precision-tracking technology, which was used in the discovery of the top quark at Fermilab and the Higgs particle at the Large Hadron Collider at CERN.

“I was thinking like, There must be other things we can do that would be cool,” Haber recalls. After hearing Hart on the radio, he had an idea. “Imagine if we took a record instead of a silicon device and scanned it and measured it.”

He bought a textbook on acoustical engineering, got some phonograph recordings, and started talking about it around the lab. Haber’s colleague Vitaliy Fadeyev, a postdoc at Lawrence at the time, wrote a program to scan phonographs. It took them a couple of hours to get one second of sound, but both saw the potential of the method. After more than a decade’s sharpening, Haber tells me, the current version of IRENE “can measure a second in five seconds, ten seconds, or twenty seconds, not an hour. That’s what made it practical.”

The digitization project on the Hearst collection is officially entitled “Linguistic and Ethnographic Sound Recordings from Early Twentieth-Century California: Optical Scanning, Digitization & Access,” but has come to be called, more colloquially, Project IRENE, which stands for “Imaging, Reconstruct, Erase, Noise, Etc.” IRENE also refers to the technology and the hardware that implements the technology. In a nutshell, IRENE is an optical scanning technology that generates a three-dimensional image of the surface of the cylinder—and other types of analog recordings such as shellac or metal disks—and converts the data to sound.

How does IRENE work? First, wax cylinders are mounted onto a mandrel. As they rotate, a chromatic confocal microscope moves along and takes measurements of the surface at a large number of data points. The individual measurements are combined to create a three-dimensional map of the surface. On the combined image, which shows the cylinder as a rectangle, as if it had been cut along its length and flattened out, the horizontal axis represents the length and the vertical axis corresponds to the groove. “It’s a topographic image,” Haber explains. “Every pixel, instead of being the amount of reflected light, which is what we normally have in a photograph, is a depth.”

This rendering of the surface is, in Haber’s words, “very verbose.” It contains unneeded data, namely information on the plain surface of the wax cylinder. What is meaningful is the depths of the grooves. The next step, then, is to have an algorithm read the data and, by averaging the heights, find where the grooves are, roughly. Then IRENE zooms in and determines “in the best way that we know how, the actual depth at each of these slices along the time direction,” says Haber. This generates a list of depths versus time which, when formatted with the proper headers, is interpreted by the computer as an audio file.

As a former collections manager at art and anthropology museums, Battle says there is nothing sadder than knowing there is content on outmoded video and audio collections but that the process of retrieving that content could damage the recordings. “The idea that you could do this without touching the object,” says Battle, “[the idea of] providing that audio and that content again for people is like a tool that is a dream for someone in the museum. And giving that back to people is an incredible gift.”

“Then if we want to,” Haber adds, “we can apply other algorithms that, for example, look for scratches or dust particles and then can subtract them out of the image if we want to, like a Photoshop kind of thing.” The algorithm finds spots where the heights appear to be anomalous or show abrupt changes and removes or restores the data using manually adjusted parameters.

This digital cleanup minimizes the effect of the damage, reducing some of the noise. “So, if you are listening for nuances in language, it’s less distracting,” Haber says. Andrew Garrett, professor of linguistics at UC Berkeley and codirector of the Hearst collection digitization project, confirms the significant improvement in audio quality. “In many cases,” he says, “the line between acceptable and unacceptable falls between the previous version and the new version.”

Though trained as a historical linguist and having worked primarily with ancient languages such as Hittite, Garrett says it’s hard not to become interested in indigenous languages, given the abundant resources on them at Berkeley. His work in Americanist linguistics, meaning any kind of linguistic work on languages of the Western Hemisphere, consists mostly of documentation and philological-historical research on Yurok, Karuk, and other indigenous languages. As the wax-cylinder recordings at the Hearst Museum are digitized, Garrett hopes that the project will help Californians become more aware of the sheer diversity of indigenous languages in California: Two centuries ago, between 80 and 90 languages were spoken within the boundaries of what is now the state of California, which makes the Golden State the most linguistically diverse area in North America. Today, the majority of these languages do not have any first-language speakers left, although in many cases there are active language revitalization programs, with the younger generations learning their ancestral languages.

recording Native language
Photo caption

Mountain Chief, a Piegan Native American, listens to recording in 1916 with ethnologist Frances Densmore. 

—Library of Congress

Language loss is not unique to California. UNESCO estimated in 2010 that at least 43 percent of the world’s 6,000 languages were endangered to various degrees. The Oxford Research Encyclopedia of Linguistics, published in 2017, estimates the number of languages being spoken and signed globally to be around 7,000. While the diversity is high, the encyclopedia reminds us, monolingualism and the emergence of global trade languages are combining to homogenize world languages. It also notes that more than half of the world’s population speaks one of only thirteen languages.

Concerns over “disappearing” indigenous cultures and languages were first voiced more than a century ago, after the Civil War, and grew louder in the last decade of the nineteenth century. In 1890, Harvard ethnologist Jesse Walter Fewkes, first known user of Edison’s phonograph for ethnographic fieldwork, recorded songs and stories among the Passamaquoddy tribe in Maine and cautioned in an essay in Science that indigenous cultural groups were “fated to disappear in the next decade.” From anthropologist Franz Boas (Kroeber’s doctoral adviser at Columbia) to linguist Edward Sapir, there was a pervasive belief among Fewkes’s contemporaries that indigenous cultures were on the brink of extinction.

As noted by Brian Hochman in Savage Preservation, these intellectuals reacted to the perceived threat of extinction by documenting and preserving what was left, an effort that came to be called “salvage ethnography.” For the twenty-first-century reader who takes for granted that “salvage” means to “save,” implying an effort to restore, it can take some effort to realize that documentation work of the early twentieth century was not necessarily intended to lead to revitalization. Instead, that generation of scholars and writers was more concerned with language data itself than how that data might be used by Native communities in the future.

As Hochman points out, Kroeber’s generation took an evolutionist view of culture, in which different cultures competed for survival. Today’s linguists (among other scholarly disciplines) believe instead that cultures are not fixed things, but “flexible webs of affiliation and difference that respond, resist, and adapt in the context of broader social pressures,” to quote Hochman.

Indeed, there are cases in which languages have survived and evolved despite external pressure, and many such efforts have benefited enormously from the collection efforts of their predecessors. Besides Kroeber and other UC Berkeley-affiliated scholars, anthropologists who made sizable recording collections of Native American sounds include John P. Harrington, Helen Heffron Roberts, Charles Lummis, Frances Densmore, and others.

Linguistics does not lead automatically to language revitalization. In Garrett’s words, “the revitalization movement didn’t come out of linguistics, it came out of the Native community.” A lot of Native people chose to work with linguists in the ‘50s and ‘60s, and Garrett believes that they did so because they wanted to save Native knowledge for themselves and later generations, at a time when the concept of revitalizing languages had not yet formed.

The late ’60s and ’70s saw a great deal of activism and cultural revival movements on Indian autonomy and tribal rights. “Those cultural movements kind of created the preconditions for language revival,” which came in the ’80s and the ’90s, says Garrett. It was in these last couple of decades of the twentieth century that linguists became aware of the language revival movement and saw that it would be meaningful for them to participate in it. Under the leadership of Leanne Hinton, UC Berkeley’s Survey of California and Other Indian Languages began reaching out to Native communities.

In general, this period also saw a shift in scholarly point of view, which began emphasizing the resilience of culture and language and celebrating the more active role being played by Native communities. And it has led to new thinking on what constitutes ethical linguistic work within language revitalization. In an area where distrust of academia is potentially high, for historical and sociopolitical reasons, the first step is to have conversations with the people of the community. “A lot of linguists,” Garrett says, “are very interested in trying to figure out, or be guided by the people that they are working with, as to what kinds of projects would be meaningful” for them.

Native communities may want their members to speak the language at home, some may want to hear the language used at certain ceremonial or public events, and some might want the language to be “visible in the form of street signs and labels on buildings,” says Garrett. Depending on the different conditions and goals, revitalization may take the form of linguist-facilitated language workshops, school classes, or programs designed to create an environment for elder speakers and younger learners to spend time together and use the language.

There are many arguments for linguistic diversity. For Garrett, the most compelling begins with the concept of “language as a repository or vehicle for expression of cultural diversity.” He says, “Your stories are in your language, your aspects of what people refer to as worldview are expressed in your language. Your other cultural practices and cultural institutions are really closely linked to the language. And so when the language is lost, all of that stuff has to be translated, if possible, into the majority language. And it’s not usually possible.”

With the help of the newly digitized recordings, Garrett, who also directs UC Berkeley’s Yurok Language Project, is compiling a Yurok-language book of Yurok narratives. He explains that in a case like Yurok, a language spoken in northwest California and which now has a small number of quite fluent second-language speakers and many tribal members with a basic knowledge, the main value of the cylinder recordings consists not so much in everyday language usage as in traditional storytelling. “There are all these stories that people know that I think typically they might know in English,” says Garrett, “or they might have heard English versions of them but they have not heard Yurok versions of them. Now there are Yurok versions of those stories that are sort of audible.” In addition to the narrative book, other progress in Yurok language revitalization includes a preliminary dictionary published in 2005 (and constantly being updated) and a grammar book updated in 2014.

Due to the sensitive nature of the wax-cylinder recordings at the Hearst Museum, access to the digitized files is limited to indigenous community members and approved researchers only. Garrett says that approximately 95 percent of requests for access to cylinder recordings come from community members.

IRENE is, in effect, a tool of digital repatriation and supportive of efforts at local and national levels to encourage active use of documentary and archival resources on indigenous languages. The biannual Breath of Life language restoration workshop, for example, pairs researchers from Native communities with linguists who put their expertise to use by teaching community members how to understand and use archival materials held at libraries and museums. Established by Leanne Hinton, the program is held every even year in California, while its national counterpart, the National Breath of Life Archival Institute for Indigenous Languages, takes place every odd year in Washington, D.C.

When ethnographers and anthropologists turned to audio technology to make authentic copies of “disappearing” cultures, they envisioned these records to be permanent copies that would “live to tell the tale” after cultural groups faded into history. What they did not see, however, is that their media formats would undergo deterioration of their own. It seems ironic, observes Hochman, that some of the “disappearing cultures” from the turn of the century, including Passamaquoddy, “haven’t merely survived over the years” but have instead “found innovative ways to salvage the media that were once used to salvage them.”

Stories of ethnographic collections, their histories, and the communities they touch are a reminder of the dynamic relation between cultures and technologies, which, in Hochman’s words, are “never as permanent, never as absolute, as they might seem at any given moment in time.” Oftentimes, it is the resilience and regeneration of cultures that create a renewed interest in researching, using, and preserving the media.

Haber, who won a MacArthur Fellowship after his work on IRENE, has thought a good bit about the give and take between culture and technology. Particle physics, he says, seeks to recreate conditions of energy and temperature that existed only a short time—one thousandth of a second—after the Big Bang. Physicists care about this because “we are trying to account for why the universe looks the way it does by going back to this more formative period. We are basically using these techniques to go back in time.” And now the techniques used in IRENE, “which are spin-offs of that time travel, are taking us back in time maybe a hundred years.”

Photo caption


IRENE, a microscopic camera created by physicist Carl Haber, digitized nearly three thousand early twentieth–century wax-cylinder recordings. 

—UC Berkeley Library

Looking to the future, Battle says she hopes that with more projects IRENE will become faster and more accessible to other institutions. After completing the digitization of the Hearst collection, the team modified the device to scan a collection of 1940s SoundScriber discs of author George Rippey Stewart’s dictation of his novels. Through this project, IRENE has “exposed dozens of students who are essentially STEM people to scientific and technical challenges in the humanities,” says Haber. “I don’t know if they’ll become humanities preservation scientists . . . but they’ll certainly know about the ways in which scientific knowledge can benefit humanities collections.”