Building a Community of Knowledge

Wikipedia founder Jimmy Wales talks with NEH Chairman Bruce Cole about the challenges and successes of an ever-evolving encyclopedia.

HUMANITIES, March/April 2007, Volume 28, Number 2

In one month, Widipedia receives more than seven billion queries, says the online encyclopedia's founder Jimmy Wales. During a recent interview, Wales tells NEH Chariman Bruce Cole about the process and people that make the phenomenon known as Wikipedia.

Bruce Cole: What is Wikipedia?

Jimmy Wales: Wikipedia is a freely licensed encyclopedia written by thousands of volunteers all over the world in many languages. The freely licensed component of that is important. It is the same idea as open-source software, meaning anyone is allowed to copy our work, modify it, redistribute it, and redistribute the modified version commercially or noncommercially.

Cole: How big is Wikipedia? Maybe you could talk about the English language version, and give us some survey of other languages.

Wales: In English, we now have more than one and one-half million articles. But even though English is our largest language, English makes up less than one-third of the total work. We’re in 125 languages now; for each we have at least one thousand articles. One thousand articles is still a fairly small encyclopedia, but that gives you an idea of the scope of the work that has begun. And there are several languages which now have over 250,000 articles. That’s in English, of course, and we have German, French, Polish, Japanese, Dutch—

Cole: Do you have Arabic?

Wales: Yes, we have Arabic. That’s a very fast-growing project, in fact. The Arabic Wikipedia now has twenty-one thousand articles and is growing steadily.

Cole: Which language is growing fastest?

Wales: It’s hard to measure that because sometimes it’s one of the smaller languages—someone will get very active and recruit several people and you’ll suddenly go from thirty articles to fifty. In terms of the larger languages, Polish has had a strong growth lately.

Cole: Wikipedia went online when?

Wales: We went online January 15, 2001.

Cole: And now you’re over a million.

Wales: One and one-half million.

Cole: One and one-half million. How does that compare with, let us say, the number of articles in Encyclopedia Britannica?

Wales: Britannica has some eighty thousand articles, I believe, but you should check. But this is really not a fair comparison, simply because they tend to have very large articles, while we tend to break things down into smaller articles. They will have, for example, one very large article about World War II. On the same subject, we will carve it up into twenty different articles. The main article would then link to sub-articles, so article count is a little tricky. In terms of words, we’re about four times the size of Britannica last time I checked. It’s probably more than that right now.

Cole: We talk about the nature of Wikipedia—its size and its languages— but how does it actually work? There is no actual venue, right? When we think about Encyclopedia Britannica, we think that there’s this editorial office somewhere, and there are these editors and editorial assistants, and they’re sending out requests and then they’re getting articles back. We think a lot about venue.

Wales: Right.

Cole: How do we think about Wikipedia?

Wales: It’s a very different model for how it’s made. The Web site is run by the Wikimedia Foundation, which is a nonprofit organization that I founded. The Wikimedia Foundation has five full-time employees; of those, two are programmers who are not here in the office. Here in the office we have the executive director, the grants officer, and the receptionist.

Cole: Everybody else is working with Wikipedia as a volunteer?

Wales: Yes, including me. All of the work goes on in public, very transparently and openly on the Web site. It’s an open collaborative effort, anybody can pitch in at any step of the way. So, it’s very different.

Cole: Is this built on Eric Raymond’s model of the cathedral and the bazaar?

Wales: A fair amount of my thinking was influenced by Raymond’s model. Before Wikipedia I founded Nupedia. We organized it in a top-down, traditional academic way and essentially built a cathedral and that just didn’t work. It was slow-moving, and it didn’t take advantage of a lot of people who wanted to pitch in and help.
It was very intimidating for people to get involved, and basically it failed because of that.

Cole: This was a sort of traditional print or letterpress model in your first venture, right?

Wales: Exactly. It was a set of editorial boards for different subjects, different areas. There was a process where you would request an article and people would write the draft, and then it would be sent out to reviewers. It was very much the traditional academic model.
The Wikipedia model is completely different. Basically anyone can add anything.

Cole: That’s the bazaar, right?

Wales: Exactly. It is a wide-open process where we invite the public to come and help us and make it better. The key to what makes it work is the Wiki software—the controls that are given to the community and the ability to revert to previous versions. If somebody comes in and messes something up, we can always go back to the best of available past versions so nothing can ever really be damaged, per se. When you edit something, you actually just make a new version. We have the ability to block people from editing based on an IP number.

Cole: That’s the lead articles, right?

Wales: All of these things go on through open processes on the Web site, where people are in constant discussion and debate about what should be done and how it could be done better. That contributes to an overall variety and quality over time.

Cole: With all these personalities debating, has the Wikipedia community had difficult moments?

Wales: One of my particular psychological illnesses is pathological optimism (laugh), so I have a hard time thinking anything bad. The hard part is when you have a really good contributor who is causing a lot of troubles and you have different groups within the community— all good people—who are arguing over this person’s behavior.

Every language community goes through this, something like the teenage years. Now I’ve seen this through; I’m experienced in this now. When each language community starts out, it’s usually five to ten people who are all very excited in working together. It’s really fun. It’s such a small group that people can overlook some bad behavior, and it’s not that hard. Eventually, as it gets bigger and bigger though, there are the first times when the community has to face the issue of “Should we ban this person who bears a significant dissent about in the community and how do we deal with that?”

Tough decisions have been made. Sometimes you say, “Okay, we’re going to work with this person, I’m going to try help him to be a little less antagonistic.” Sometimes you just have to say, “I’m sorry, you are doing really good work, but you’re just costing us good people because you are too annoying.” Those are the kinds of difficult moments for us.

Cole: Those are the whole social network, right?

Wales: It’s a whole social network.

Cole: And the information network is your social network.

Wales: When you get involved in talking and editing daily with people, people make friendships, and it’s a fairly rich human interaction for people. In China the IRC, the Internet Relay Chat, is blocked and so the Chinese community here uses Skype, because that’s not blocked. Everybody is just one click away to talk on the phone, so the Chinese community actually feels closer than most of the others because they actually pick up the phone and call each other on their computers.

Cole: Let us just go back to the cathedral or the bazaar model. Was this idea first the development of codes or software?

Wales: Basically, what Raymond talked about was roughly two different models for software development. One would be to think of it as a cathedral model in the sense that there would be very trusted, learned scholars toiling away more or less in secrecy and privacy, and then, presenting their work to the world.

The bazaar model is more an open marketplace where lots of people are coming and going, and people are buying and selling, and giving and trading. Certainly, the encyclopedia model is where Britannica goes out and finds some esteemed scholars to pen the encyclopedia articles. Their model actually works reasonably well. There is nothing particularly bad about it, but there are some problems with it.

For example, bias is a big problem. You can really see this, if you’d go back and look at the 1911 Britannica which is really a classic edition, but boy, it’s a real piece of work. The bias has become a little more obvious with time—but you do see that even in contemporary articles because they are written by one person.

Cole: I contributed to Britannica. There are probably lots of people out there who know as much about my subject, who may not be in the academy. That’s what I was getting at.

It’s my work and it’s my take on whatever the subject, it’s the editorial approval, and then the article appears.

Wales: Exactly. And in many cases, that’s wonderful. I have a volume at home of famous classic ethics from Britannica which I would consider to be important historical literature, but that doesn’t necessarily mean it’s an encyclopedia in the sense that we would hope an encyclopedia could be not one person or a field, but a general exposition. Then some of the other problems with the traditional model really have to do with the sheer size and scope, the sheer scale of human knowledge, which is absolutely vast. The traditional model doesn’t scale very well.

In other words, Britannica is a certain size simply because the cost of producing it would be too enormous to go into the kind of details that Wikipedia does. Sometimes in Wikipedia you’ll see coverage of fairly trivial topics, but the point is, well, why not cover fairly trivial topics if we have the resources to do it? Certainly you can find articles in Wikipedia that you would never expect to find in Britannica simply because of their cost structure.

Cole: Trivial to whom—that’s the other question.

Wales: Exactly. Say, the history of a pop band only around for three years in the 1960s. Well, if we have to cut because we have a lack of space, we will cut that before we would cut Thomas Jefferson.

Cole: Of course.

Wales: Something that is fairly clear. On the other hand, “wiki” is not paper. There’s never a reason to say there’s not enough space or something is just taking up space; it does not take up any space that matters.

Cole: Wikipedia is theoretically expandable forever, right?

Wales: Well, there are some interesting limitations that we find. There are limitations that are in a certain theoretical sense the same kinds of limitations that you would say are in a Britannica model or in a traditional model, which is limitation in terms of the cost of getting accurate information. And here I am not just talking of the monetary costs, but the feasibility.

As an example, sometimes people raise the question of why shouldn't we have an article on every single person in the world. Well, the answer is we have no feel for a way to get the verifiable and reliable information and not be hoaxed. Clearly, it would be lovely to have an encyclopedic article on anything that you could think of, but even in our model, which has a much greater reach, it still can be quite difficult.

Cole: Right.

Wales: One of the classical debates within Wikipedia is about elementary schools. Should we have articles about elementary schools? The general decision is that in most cases, the only information that you can find about them is from the schools’ own Web sites, which may or may not be reliable. Of course, you make exceptions. There are some fairly famous schools that become famous for some reason, but in general we would say “no” even though there is some information there. It’s not just feasible for us to confirm it.

Cole: That’s fairly logical. You have many eyes on Wikipedia. Let’s talk about how all this gets done. Somebody gets an article—let’s say about the telephone—and they post it. Is that correct?

Wales: Exactly.

Cole: Then what happens?

Wales: Every change to the Web site is posted to a Recent Changes File monitored by a lot of people. Also, users have their own individual watch list so that they can monitor articles that they’re interested in.

Cole: You can send an e-mail or something like that, because of the change in their watch list.

Wales: When you log in, you just check the Recent Changes in your watch list. Most of those who are really active have their options set so that anything that they touch gets added to their watch list.

There are all kinds of subgroups of people working within Wikipedia. Some people just do spell-checking. We don’t allow people to run robot spell checkers on the site, unless they’re human supervised, because robot spell checkers are really stupid and long. What some people do, they run a script and they just sit there and it shows them words that spell the things that are misspelled, and they say “yes” or “no.” It sounds like a completely tedious job but people enjoy doing it.

Then you have people who are interested in particular subjects: mathematicians who work pretty much exclusively on articles about statistics; people who are drawn to conflict resolution and try to help people run a debate. There are all kinds of specialties and all kinds of different people who are doing those things.

Cole: So I put my piece on the telephone in there, and then that’s noted in Recent Changes. People start looking at this article, right?

Wales: If it’s a brand new article—of course, history of the telephone, they already have it—it goes to a New Pages List that people look at.

Typically what happens when somebody puts in a new article, if that person hasn’t really conformed to the style guidelines, people come in and place the article into a right category. History of the telephone might go into telecommunications. Readers will mark the text if any major concepts mentioned in that article link to something somewhere else. Then people review it for basic sanity and people start looking at it.

Cole: If there are obvious, factual errors, people would be able to define those. People read these articles within hours, right?

Wales: Exactly. Typically, anything like this would be noticed and worked on within hours. Some of the things that happen are very much about human judgment—if someone thinks there’s something really seriously bad with the article, somebody would typically just delete huge chunks of it. Every article has a discussion page associated with it, so people move text onto the discussion page and say, yes, this sounds like a pretty bogus claim. I could not find any confirmation of the source. That’s when it will raise notice for other people to come and take a look.

Cole: This is a dynamic ongoing process and, of course, what people are doing is updating continually. I interviewed Steven Johnson for Humanities a couple of months ago, and he was talking about Wikipedia. He was looking at this article on stingrays hours after Steve Irwin had been killed by one, and there he found a reference to Steve Irwin. I’ve done that too, and I find things that are just from yesterday. That’s another advantage, right? Not only could they be continually corrected, but they can be continually updated.

Wales: One of the things that’s interesting about something like that, is that probably in the long run the stingray article doesn’t really need to mention Steve Irwin. That was in the news at that time and was probably of interest that day. I was just looking at the article right now, and what it says is, the fatal sting which killed Australian naturalist Steve Irwin in 2006 was extremely rare.

Cole: There’s more about the stingray than Steve Irwin.

Wales: Exactly. But it is contemporary and it actually anticipates the thing in a healthy way. Someone might be very afraid of stingrays now and it would be helpful for them to know that that is actually quite rare. I am guessing in ten or fifteen years, that’s going be a footnote in history. People will say that’s really just some accidental thing that happened in the past.

Cole: Let us talk a little bit about accuracy because this is always an issue when people talk about Wikipedia.

Wales: Absolutely. My general take on accuracy is that, in general, the accuracy of Wikipedia is surprisingly good considering how we create the work. It is always a work in process. We are looking at introducing some features, such as stable versions of some articles. But until that time, we’ll always have live editing. That’s important for people to realize when they’re using Wikipedia—that it isn’t something that at that moment in time is completely vetted and to the degree that Britannica has been.

In many cases, the Wikipedia entry has been vetted to a far, far greater degree overall than Britannica, but any given revision may or may not be. And so it requires a bit of media competency for people who are trying to figure out what to rely on and what not to rely on with Wikipedia.

It’s simplistic to say “Wikipedia is fantastic, it’s just as good as Britannica.” It is and it isn’t. It’s also simplistic to say “It’s written by these crazy people who may or may not know anything and you can’t trust it all.” That just does not really mesh with experience when you look at the vast majority of articles. It’s technically pretty good.

Cole: You are somewhere in the middle.

Wales: We’re somewhere in the middle. I think with some experience and with some common sense, you can begin to have an idea of the sorts of things that you should or shouldn’t count on it for.

Say you want to know who won the World Series in 1949. I can tell you with a pretty high degree of certainty, we will have that right. If you go into that article about the World Series in 1949 and you read about some amazing catch in the ninth inning which ended the game, it sounds kind of possible, but somebody could have added that as a prank, so you should check the history. You should maybe check on the source before you really rely on this exciting article some kid did. It just depends on what it is that you are trying to do.

Cole: One of the ways that you could check that is within Wikipedia itself, isn't it? They'll probably be linked.

Wales: Exactly, and we really have a push on to have more and more external links to try to point people in the right direction. It isn't always linked because there's a huge amount of stuff that's still not online.

Cole: The bibliography, the books, the sources, things that are live—

Wales: We're realistic about what is the real likelihood that somebody is going to look up the source in the library. Well, a lot lower if we can find something online. You want to give them both, right?

Cole: That's a very interesting statement. Twenty years ago, this wouldn't even be on the radar screen. These people would be going to the libraries.

Wales: One of the ways that Wikipedia has made me personally smarter is that whenever I wonder about something, I just look it up.

Cole: You go to Wikipedia?

Wales: Let’s not just think Wikipedia, but the whole interest in the Internet. If you were thinking twenty-five years ago about stingrays and wondered if they stung people to death, you’d say it’s logical to look it up in the library, but you are busy and you’ve got no time to go to the library to look up some little piece of information.

Cole: It’s good if you have a library to go to.
Wales: Now, it’s at your fingertips.

Cole: Say I want to look up a person, let’s say George Washington, and I go to Google, and I type in “George Washington,” it’s very likely that Wikipedia will be, if not the first on the list, then maybe second or third, right?

Wales: It may be the second.

Cole: Tell me what that means.

Wales: Well, we do rank very highly in Google because of the huge swath of cybersearches. Basically, nobody really knows exactly how Google determines how to rank them.

Cole: It’s not a kind of popularity contest, is it?

Wales: We know in part that it has to do with the number of links from other sources and the quality of those links. With Wikipedia, so many people use it, lots and lots of people link to it from a lot of different places.

Cole: Do you have any idea what kind of numbers, what kind of hits those represent?

Wales: One of the funny things about us is that because we don’t have advertising on the Web site, the only person who really needs to know the traffic numbers is me, so that I can tell people when I’m doing interviews. We check the history. You should maybe check on the source before you really rely on this exciting article some kid did. It just depends on what it is that actually do a very poor reporting on things like how many hits.

Cole: Any ballpark figure?

Wales: The best estimate I know is around seven billion pages per month. That’s probably a low estimate because I’ve been giving that out for the past few months and traffic has gone up.

Cole: Is that in all of the Wikipedia or just English?

Wales: That’s across the entire site, so that’s the total number of pages in all languages. English, although it’s only about one-third of the work, I believe represents a higher proportion of the traffic, just because lots of people speak English as a second language.

Cole: I am very interested in the way knowledge is acquired on the Web. There are discrete packages of knowledge, but when you connect them something else happens. It’s really the linkage that counts. When two packages of information are seen in connection, something comes up.

Wales: Absolutely. One of the experiences that a lot of people reported is you go to Wikipedia and you’re looking for one piece of information, but you end up following a series of hyperlinks to other things, until you end up with something that’s only very tangentially related to what you start with.

Then there’s another type of experience. You start reading something and you find that you needed to click on the hyperlink because in order to fully understand the context of what you’re reading now, you need some other background information.

You think in context. Suppose you’re looking at the Battle of the Bulge—I just decided to call up this article— and it says the battle was officially called the Battle of the Ardennes. Well, “Ardennes,” what’s that? So I click on that and I find out it’s a region of extensive forest, in Belgium, lots of forest. That gives me some more richness to what I am reading at the moment.

Cole: If you search for “natural gas,” you click on that and you learn its components, and its elements, and where it’s found.

Wales: Right.

Cole: And on and on and on. So I think about it as not just a kind of linear progression to knowledge but a kind of vertical one and horizontal one.

Wales: Actually I like the phrase, the spiral. As you learn more background and more parts of it, you can spiral upward toward an overview bubble.

Cole: I want to congratulate you on being one of Time's 100 people who shape the world.

Wales: Isn’t that kind of fun?

Cole: I think that’s great. Time says: “Wales is celebrated as a champion of Internet-enabled egalitarianism.” You’re described not as an anti-elitist but an anti-credentialist. That’s the distinction. It means that amateurs can have as much to contribute as professionals and that talent can be found anywhere. It was predicted that mob rule would lead to chaos.

Instead, it has led to what may prove to be the most powerful industrial model of the twenty-first century, peer production.
What does that mean in that larger sense, when Time talks about a powerful industrial model of the twenty-first century?

Wales: Well, it’s rather bold. I’m embarrassed by it, but I definitely think it’s not just starting with Wikipedia but starting with the peer production of open-source software, free software. It is a huge, important model that has become possible because of the dramatic drop in the cost of communications, with the invention of computers and so forth. We are just beginning to see what that’s going to be all about and how that is going to work out for us.

So I guess it could be a very powerful model of production. Certainly, one of the things that’s been interesting for me is, because we are an inherently global project, this incredible interactivity—people from different parts of the world doing business, culture, commerce, having fun on the Internet, talking on the phone, whatever—is one of the defining terms of our age. Obviously, that’s not a unique observation on my part, but it has become very real to me to go and spend time in different places in the world and realize how much people are the same everywhere. I just think we’re going to see a lot more collaboration all around the world.

Cole: What prepared you to get to Wikipedia? You didn’t start off that way. You started off as a—

Wales: As a futures and options trader. I had a background of finance. I was really fascinated by game theory and the questions of social interaction and incentives. I feel that a lot of my work within Wikipedia has been made possible by thinking about incentives. People ask, “How do you elicit good behavior, how do you limit bad behavior?” Absolute security is not the answer.

It’s all about incentives and cost benefits. So you make it a little more costly for people to do bad and a little easier for people to do good, and you really get results without having controls on everything. That’s one of the pieces of my background.
Then the other piece that I can identify would be just my own education. I lived as a child with a big encyclopedia reader, and then I went to a small private school. Time magazine once reported that I was home schooled but that’s not correct.

Cole: Your mother ran the school.

Wales: My mother ran the school. It was a very small private school. We had about four kids in each grade, and we had an enormous amount of free time to do as we pleased. What that meant for me was lots of reading. I was a voracious reader of all kinds of different things and that encyclopedic kind of approach to learning has always had a strong appeal to me.

Cole: Isn’t one of the reasons there’s so much interest in Wikipedia is that there’s the community interest without being forced to do it, without being paid to do it? People get involved because they’re widely enthusiastic about it, deeply interested.

Wales: There are several different dimensions to that. Some people are just very enthusiastic about our charitable mission of a free encyclopedia for everyone. Then, one of the interesting things about the modern age is that in our careers we are typically forced to specialize. Well, it turns out that we have a lot of well-rounded intellectuals in the world still—people who are working by day as geographers and by night are just absolutely fanatical about the history of ancient Egypt. Professionally, people don’t necessarily have any kind of good outlet for that.

Cole: Or think about historians writing on the Battle of Gettysburg. If you talk about people who have deep, deep subject knowledge and who are absolutely dedicated to it, it’s the Civil War buffs. They know more about some particular segments of that battle than anyone.

Wales: Definitely.

Cole: There are deep pockets of knowledge all over the place, certainly in the academy but also outside of it. What I like about Wikipedia so much is that it has tapped into this huge reservoir of knowledge.

Thank you for talking with me today.