Building Chempedia: Learning About Contributors
Chempedia is a free online chemical encyclopedia similar in concept to the Merck Index, but radically different in implementation. One key difference: the Merck Index is compiled by a small number of paid professionals while Chempedia is compiled by thousands of unpaid volunteers. Although this distinction raises a host of intriguing questions, one of the most basic revolves around what can be said about these volunteers in the aggregate. This article, the first in a series, explores this issue with some statistics compiled from Chempedia.
Learning About Contributors
Chempedia works in part by aggregating content from Wikipedia dealing with single molecular entities, or "Compound Monographs." This content is created by the now famous process of individuals taking upon themselves the responsibility of fixing what's broken in Wikipedia. (Some take it upon themselves to break what's working, but that's another topic.)
Chempedia associates each of its Compound Monographs with the last Wikipedia user to edit it. The current interface to these relationships is available on the Chempedia contributors page.
The interface to this page is currently limited. The analyses reported here were made for the most part by querying the Chempedia database directly.
Each contributor is linked to a contributor summary page containing links to that user's Wikipedia homepage and talk page, as well as a complete listing of all active contributions. For example, you can view the contributor page for one of Chempedia's most active contributors, Arcadian.
The data model is also limited. Because Chempedia only records the last Contributor to edit a Monograph, when another Contributor edits a Monograph, the link between the previous Contributor is lost. As a result, many Contributors have no associated Monographs.
How Many Monographs?
Chempedia currently hosts 6,308 Compound Monographs.
How Many Contributors?
Chempedia currently lists 2,516 Contributors. Of these, 1,046, or 42% are associated with one or more Monographs, meaning that they were the last to edit. The remainder are associated with no Monographs for which they were the last to edit.
Here is a list of the top 20 Contributors and the number of Monographs they were the last to edit:
anonymous | 1022 |
DOI bot | 904 |
Edgar181 | 378 |
Fvasconcellos | 170 |
Meodipt | 151 |
Arcadian | 144 |
Chem-awb | 133 |
Chowbok | 122 |
Rifleman 82 | 114 |
SmackBot | 105 |
Thijs!bot | 99 |
ChemNerd | 85 |
Puppy8800 | 80 |
DumZiBoT | 78 |
Axiosaurus | 63 |
Chempedia | 63 |
Carlo Banez | 55 |
Benjah-bmm27 | 52 |
OKBot | 51 |
Cacycle | 50 |
These Contributors represent 1.9% of all active Contributors and collectively are responsible for being the last to edit 62% of all Monographs. Although not performed here, a histogram plotting number of contributions would be expected to follow a power law.
'Anonymous' is an aggregation of all users who edited a Monograph without a Wikipedia account. 16% of all Monographs were last edited by an anonymous user. Leaving out the aggregated 'anonymous' users indicates that roughly half of all Monographs were last edited by the top 19 Contributors.
What is a Contributor?
Although it's difficult to say a lot about individual Contributors, most appear to have some training in science, although that training may not have involved chemistry or biology. Still others (for example, SJP) appear to have been drawn to contribute to a Monograph based on their nonscientific experience with the title compound or in an effort to fight vandalism or otherwise improve the nonscientific content of the Monograph. The ability of services like Wikipedia (and by extension Chempedia) to provide a platform for those without formal training in a particular area to make useful contributions is without question one of its most useful (and controversial) features.
Some Contributors are not even human, but rather robots designed to improve the quality of Wikipedia articles in general. For example, SmackBot performs an array of tedious quality control jobs such as fixing bad checksum ISBNs (CAS Numbers, anyone?) and capitalization errors.
Conclusions
Wikipedia's collaboration model has made the creation of a free and continuously-updated chemical encyclopedia feasible. Applying chemistry-specific user interfaces and data models exposes this hidden treasure. Although it's tempting to think of this process as mainly being the work of a handful of trained scientists, the numbers suggest a much broader base of contributors. Future articles will explore this idea.
Related Article: Building Chempedia: Social Networking Applied to Chemistry.