Pop Up Archive Filled a Need for Audio Archiving, and Apple Noticed

A service for transcribing and cataloging audio files hits the big time.

HUMANITIES, Fall 2017, Volume 38, Number 4

When radio producers Davia Nelson and Nikki Silva, aka the Kitchen Sisters, got their start, they had their eye on making great programming. They didn’t worry about what to do with all the audio files they would create in the process.

That changed over the decades as cassettes, reel-to-reel recordings, and digital audio tapes piled up. “We’ve been through all the mediums,” Silva says. “We had no idea it was going to grow into this huge collection of material”—a rich “accidental archive,” as she and others call it, but for all practical purposes an invisible and inaccessible one.

To have any sort of useful life after their debut, broadcasts and podcasts need to be preserved, described with good metadata, and made searchable so that researchers and the public can explore them. “With audio, there’s no way you’re going to find anything if it’s not transcribed,” Silva says. “When we’re working fast and we have no money, often it doesn’t happen.”

Enter Anne Wootton and Bailey Smith. In 2012, as graduate students at the University of California–Berkeley School of Information, they joined forces with the Kitchen Sisters to build a software platform and a set of tools that became Pop Up Archive. The idea was to create “a simple organizing system for audio to make it searchable, discoverable, and reusable,” Wootton says via e-mail.

NEH’s Office of Digital Humanities gave the Kitchen Sisters and Pop Up Archive a start-up grant in 2013, then gave the archive and Public Radio Exchange (PRX) a subsequent implementation grant in 2014. As the project evolved, it received boosts of funding from the Knight Foundation and the Institute of Museum and Library Services (IMLS). Then, last month, Pop Up Archive found a different and unexpected kind of success as Apple swooped in and acquired the company.

Wootton and Smith can’t yet discuss what their future as part of the Apple empire might look like. Some observers have speculated that the Silicon Valley giant plans to use Pop Up Archive to make podcasts easier to search. For now, Pop Up Archive has gone dark, and its clients must look elsewhere for audio archiving and transcribing.

Whatever lies ahead, during its five-year existence Pop Up Archive helped accelerate a collective and rapidly evolving effort to get a handle on audio archives, accidental or otherwise. That Pop Up Archive won the intense support it did in a short time span speaks to the pressing need felt not just by radio producers and broadcasters but by galleries, libraries, archives, and museums, too.

From the beginning, Pop Up Archive wanted to play a role in building humanities infrastructure. “We realized very early on that even if we built an intuitive web-based interface for organizing and adding metadata to audio, even the best-intentioned would-be archivists would never have the time to listen through and tag all that audio themselves,” Wootton says. This led to experiments using speech-to-text software “with the goal of making archiving—and thus searchability and discoverability—part of the production process from the start.”

Perry Collins, now head of copyright and scholarly communications at Ball State University, supervised the Pop Up Archive grants when she was a program officer with NEH’s Office of Digital Humanities. “I was really struck by the potential of the project and also how open they were to the many possible futures of a project like this.” It was, Collins continues, “part of a wave of projects coming out of NEH” that were concerned with existing collections, what had already been digitized and how to make them more discoverable.

How exactly to organize so much material and make it available? Trevor Owens, now head of digital content management at the Library of Congress, worked with Pop Up Archive in his previous job at IMLS. “There’s a lot of tools and systems that have been around,” Owens says. “My understanding is they were able to put together how those would work as a service.”

For instance, Pop Up Archive’s work with the Kitchen Sisters made use of Omeka, an open-source platform designed for libraries, museums, and archives that want to showcase collections and exhibits online. Wootton mentions the work of Mark Boas and his project Hyperaud.io as an inspiration, along with popcorn.js (“Mozilla’s HTML5 video and media library for the open web”) and the Digital Public Library of America in its early days.

Pop Up Archive has had a stellar lineup of collaborators and partners. In addition to the Kitchen Sisters and PRX, Wootton and Smith have worked with the Studs Terkel Archive, the BBC, and the public broadcasting powerhouse WGBH, among others. In early 2016, WGBH received an Institute for Museum and Library Services grant to digitize 68,000 items from a hundred public broadcasting stations across the United States. The items were part of the American Archive of Public Broadcasting (AAPB), a major collection of historical publicly funded radio and TV programs run by WGBH and the Library of Congress.

Pop Up Archive did speech-to-text transcriptions of those 68,000 items, “the notion being that the more data they had to train their tool, the better the tool gets,” says Karen Cariani, senior director of the Media Library and Archives at the WGBH Educational Foundation and the project director of the AAPB. “We had very little metadata about many of those items.”

The archive’s software completed the transcriptions in a matter of months, she says, though with some mixed results in terms of accuracy—unsurprising, given the variance in recording quality, regional accents, and ambient noise among the different sound files.

Still, even an imperfect transcript beats not having a transcript at all. To correct and refine the results, AAPB has turned to crowdsourcing with a game called Fix It, currently in beta testing, that invites players to flag and fix errors in the machine transcriptions.

Because humans are able to pick up on and distinguish among regional dialects, street sounds, and other variables found in audio files, they tend to transcribe more accurately than machines. But machines are faster. “With these volumes of digital materials, the cost of human labor is just enormous,” Cariani says.

How to train machines not just to transcribe audio files accurately but to distinguish among sounds is a challenge that preoccupies Tanya Clement. She’s an associate professor in the School of Information at the University of Texas–Austin and the leader of the HiPSTAS (High Performance Sound Technologies for Access and Scholarship) project, another recipient of NEH grants.

Clement and her colleague Steve McLaughlin have focused on how to teach machines to undertake the “cultural analysis” of sound files—distinguishing bursts of applause from intervals of music, for example—and to recognize individual speakers’ voices. That includes working with Kaldi, an open-source speech-recognition software tool that wasn’t developed by Pop Up Archive but became integral to its speech-to-text experiments.

“Part of any machine learning is teaching the machine,” Clement says. Pop Up Archive got the best results with radio programs, “because those are well produced, often single-speaker collections that are easier for a machine to handle.” Older recordings and those with multiple speakers tend to thwart algorithms.

Clement adds, “I do think that machine learning programs are invariably difficult, especially when you’re working with humanities data, because there are so many variables, and nothing is clean.”           

Audio files can be “heavy,” Clement says, meaning big and hard to process on individual computers. Computer power helps. So do streamlined workflows, one of Pop Up Archive’s signature contributions, as Clement and others note. “The added benefit of Pop Up Archive was the really intelligent work they did behind the scenes to process the files, facilitate the workflow—which is not a small task,” Clement says.

Machine-enabled does not mean blindly automatic. “I think people have this sense that with machine-learning algorithms, you just switch them on and they produce outputs,” Clement adds. “But it’s a lot of human labor—processing files, evaluating, tweaking parameters so that your model works well with the situation you have at hand.”

Brett Bobley, director of NEH’s Office of Digital Humanities, emphasizes that the founders of Pop Up Archive put in a lot of time working on the human element. Working with the Kitchen Sisters gave them entrée to a network of radio producers around the country. “They held training seminars while at the same time developing a framework for searching, transcribing, and ultimately saving” audio files.

That investment in training people and establishing technical standards will outlast Pop Up Archive, which no longer exists, at least not in the public form it once did. But its open-source Kaldi models remain freely available on GitHub, and those models have been forked—reproduced—by the UT–Austin team and by WGBH, according to Wootton. “These models enable any organization with certain basic technical resources and capabilities to run speech-to-text software at scale across audio collections,” she says. An optimistic assessment, perhaps, but many are hoping that it proves true.

Trevor Owens of the Library of Congress hopes that Wootton and Smith haven’t given up the public broadcasting and archive worlds for good. “We have a lot of really interesting problems,” he says. Their style of entrepreneurship, he points out, is “not something we see that often in this space.”