August 15, 2016
What does the French Enlightenment writer Voltaire have in common with Sigmundur Davíð Gunnlaugsson, until recently the prime minister of Iceland? Though a world apart in time and occupation, the philosophe and the politician have had their social networks probed and illuminated by different iterations of the same software tools.
Created to help advance humanities research, those tools are enjoying a second, journalistic life beyond the academy. In an era besotted by big data, they’re staking out a middle ground between algorithmic analysis and close reading.
Voltaire plays a starring role in Mapping the Republic of Letters, a digital-humanities project that uses letters to trace social and geographical connections among writers and thinkers active in European intellectual circles in the 16th, 17th, and 18th centuries. Gunnlaugsson is one of the so-called Power Players in the investigative journalistic juggernaut known as the Panama Papers, billed as the largest data leak in history.
Both sets of investigations—one academic, one journalistic—rely on the ability to make narrative sense out of large amounts of raw material. For Mapping the Republic of Letters, that includes the Electronic Enlightenment, a University of Oxford collection of more than 53,000 letters to and from Voltaire, Rousseau, and many other Enlightenment figures of great and lesser prominence. The Panama Papers project, coordinated by the International Consortium of Investigative Journalists, or ICIJ, has recruited journalists from many countries and news outlets to dig into 11.5 million financial and legal records from a major offshore law firm, Mossack Fonseca.
The firm’s clients include high-profile and well-connected figures like the Icelandic PM; Mauricio Macri, the president of Argentina; other world leaders in Europe, the Middle East, and South America; and childhood friends and associates of Russian president Vladimir Putin. An anonymous source leaked the documents to a German newspaper, and ICIJ has led the ongoing collective effort to trace paths through that info dump and find within it stories of hidden assets and sometimes questionable business dealings.
Even the most seasoned investigative reporters would need years to hand sort those millions of records. With the help of a data-visualization toolkit provided by a company called Linkurious, the Panama Papers journalists have been able to follow the data trail from point to point. (Linkurious’s slogan is “Connect the Dots,” a nod both to the graph-node visualizations it enables and to the investigative trails it helps identify.) That has made it possible for them to ferret out the holdings and interrelationships of Mossack Fonseca’s clients in months, not years.
The graph visualizations made possible by Linkurious—and used so effectively by the Panama Papers investigations—partly owe their existence to the curiosity of humanities scholars. A precursor of Linkurious, called Knot, came out of two grants, made in 2009 and 2013, by NEH’s Office of Digital Humanities to support and develop Mapping the Republic of Letters. Both Knot and Linkurious use linked nodes to represent connections visually.
Led by Dan Edelstein, a professor of French at Stanford University, Paula Findlen, a professor of history there, and other scholars, Mapping the Republic of Letters draws on the talents of humanists and computer scientists from Stanford, the University of Oklahoma, the University of Oxford, and other international partners. The group’s aim: to “revolutionize the practice of interpretive research in the humanities” by developing “highly interactive tools for excavating and dissecting details about people, places, times, and relationships in large data sets.” That includes the intricate social webs of correspondence, education, and travel that linked people across Europe “from the age of Erasmus to the age of Franklin,” as the website puts it.
The right tools for flexible, network-level analysis didn’t really exist when Edelstein et al. began their work. “At the time, most of these social-network visualization tools were designed for processing large amounts of data with algorithms,” Edelstein says. But algorithms aren’t the most useful way to grapple with the kinds of information included in Mapping the Republic of Letters.
Historical data sets—collections of letters, for instance—tend toward the idiosyncratic and imperfect. An archive might be “incomplete in unpredictable ways,” says Nicole Coleman, digital research architect at the Stanford Libraries. For instance, Voltaire burned many of the letters he received, while his correspondents tended to hold on to missives they received from him.
Such behaviors lead to an incomplete and imbalanced archive, not good fodder for algorithmic analysis. “All of those algorithmic models are based on a kind of assumption that the data you’re looking at is complete and meaningful,” Coleman says. “We needed to come up with a visual representation of uncertainty.”
Coleman has been closely involved in the ongoing tech development and scholarly work behind Mapping the Republic of Letters. She’s the research director of the Humanities + Design lab at Stanford’s Center for Spatial and Textual Analysis, whose mission is “to inject humanities thinking into the design of tools,” she says. “Not only for the sake of humanities research but for the sake of dealing with complexity in a data-driven world. Humanists are particularly good at dealing with uncertainty and complexity.”
Eighteenth-century archives can be a study in uncertainty. Voltaire’s treatment of his letters is not an archivist’s dream. Out of about 19,000 letters, only 10 percent or so preserve complete location information for both sides of the exchange, according to Mapping the Republic of Letters. By comparing not just the extant letters but the locations and connections of the people who corresponded with the French philosopher, the researchers were able to draw conclusions about his network, using less obvious connections to compensate for the obvious gaps.
They found, for instance, that Voltaire wasn’t quite as cosmopolitan a letter-writer as one might assume, given his international prominence. About 70 percent of his correspondents were also French. And while he never wrote directly to Benjamin Franklin, as far as we know, he and the American had a number of correspondents in common.
More than Numbers
Historians, like journalists, tend to ask questions that can’t be answered with numerical analysis alone. “When you have a complex network and you’re trying to understand it, it’s not just about the metrics,” Edelstein says. “It’s about working your way through, looking for patterns, looking for trends that might not be apparent from a mathematical perspective.”
Also like journalists, humanities scholars bring an unquantifiable sense of context to their research. As Edelstein puts it, they need tools that allow a researcher to “get your hands dirty, move things around, sift through it, and bring your own expertise to the tool at the same time.”
Enter Knot, visualization software born at a 2012 Stanford workshop, “Early Modern Time and Networks.” Coleman’s lab team took part, as did members of the DensityDesign Research Lab at the Politecnico di Milano. Sebastien Heymann and Romain Yon, two French computer scientists, participated, and contributed to the open-source code that underpins Knot. Later, back in France, Heymann and Yon built on that code to create Linkurious, and launched a startup company—an outcome the scholars and funders involved did not foresee.
“One of the wonderful things about funding research is how it can lead down unintended paths, sometimes having an impact in areas you never would have anticipated,” says Brett Bobley, director of the NEH’s Office of Digital Humanities.
That’s one reason recipients of ODH grants are encouraged to produce open-source software. “We want people to take it and repurpose it in other settings,” Bobley says. “It’s pretty pleasing to see.”
Without the work that led to Knot and then to Linkurious, the Panama Papers investigators might still be slogging through data to produce their first stories. “Linkurious was crucial,” says Emilia Díaz-Struck, research editor at ICIJ. “We are talking about big amounts of data and documents.”
How big? The initial leak of 11.5 million documents contained unstructured data with information about more than 200,000 companies, registered in 21 jurisdictions and connecting people all over the world. “With that amount of information, it is easy to miss things, to miss key stories,” says Díaz-Struck. “It is easy to get lost among millions of documents if you don’t have a way to approach them.”
ICIJ extracted data from the leaked files to create a database of company and shareholder names and addresses, registry jurisdictions, and so on. Linkurious offered “a very good starting point to make sense of connections and find politicians and people of public interest” and trace their alliances and offshore business dealings, Díaz-Struck says.
ICIJ has shared the data with many news outlets, which has put a number of politicians in an uncomfortable spot. In April, for instance, the British newspaper the Guardian followed up reporting by Reykjavik Media and Sweden’s SVT television network, and ran a story revealing that Iceland’s prime minister and his wife owned an offshore holding company, Wintris Inc., based in the British Virgin Islands. Wintris’s holdings included debts from several Icelandic banks that had failed. Gunnlaugsson’s failure to disclose the investment to Iceland’s Parliament and voters led to calls for a snap election. The prime minister eventually stepped aside.
As with Mapping the Republic of Letters, the Panama Papers work relies on human interpretation to make the most of what the visualization software can do. “You still need the journalistic skills to make sense of the data and then report the story,” Díaz-Struck says. Combine that with the right tools and “then you will have a powerful investigation.”
The tool-building begun by the Mapping the Republic of Letters team hasn’t stopped yet. Edelstein, Coleman, and their colleagues are at work on the next generation of visualization tools. Palladio, another NEH-supported tool, combines nodes of connections with map views. Fibra, in development with a grant from the American Council of Learned Societies, will draw on linked data to create a flexible tool that Coleman hopes will be like sketching a network as you read through texts, turning “cold” connections into “hot” ones by linking them to existing information held in far-flung libraries and collections.
Who knows what future investigations these tools will make possible? The Panama Papers journalists used Linkurious “because it was the only thing available to do this work,” Coleman says. “That’s a huge win.”