Stone Soup
Once upon a time, somewhere in post-war Eastern Europe, there was a great famine in which people jealously hoarded whatever food they could find, hiding it even from their friends and neighbors. One day a wandering soldier came into a village and began asking questions as if he planned to stay for the night.
"There's not a bite to eat in the whole province," he was told. "Better keep moving on."
"Oh, I have everything I need," he said. "In fact, I was thinking of making some stone soup to share with all of you." He pulled an iron cauldron from his wagon, filled it with water, and built a fire under it. Then, with great ceremony, he drew an ordinary-looking stone from a velvet bag and dropped it into the water.
By now, hearing the rumor of food, most of the villagers had come to the square or watched from their windows. As the soldier sniffed the "broth" and licked his lips in anticipation, hunger began to overcome their skepticism.
"Ahh," the soldier said to himself rather loudly, "I do like a tasty stone soup. Of course, stone soup with cabbage -- that's hard to beat."
Soon a villager approached hesitantly, holding a cabbage he'd retrieved from its hiding place, and added it to the pot. "Capital!" cried the soldier. "You know, I once had stone soup with cabbage and a bit of salt beef as well, and it was fit for a king."
The village butcher managed to find some salt beef . . . and so it went, through potatoes, onions, carrots, mushrooms, and so on, until there was indeed a delicious meal for all. The villagers offered the soldier a great deal of money for the magic stone, but he refused to sell and traveled on the next day. ...
-Forrest M. Hoffman, William W. Hargrove, and Andrew J. Schultz The Stone SouperComputer Site
Recently eMolecules donated a substantial amount of new code to the Open Babel project. This code, which could help developers of molecular databases write faster query engines, is now free for anyone to use for any purpose. Are eMolecules worried about having parted with something it took them time and money to develop and which could eventually help a competitor? I doubt it. Maybe they're no longer using this code and just decided to give something back to the community. But maybe they're acting out of enlightened self-interest. Scientists naturally grok the Stone Soup story. And so do people who have used Open Source software.
Jealously hoarding intellectual property in a competitive environment seems, on the surface, as rational as hoarding cabbage during a famine. Maybe the hoarding instinct comes from our understanding of the physical world. After all, there's only so much stuff to go around. If I give my cabbage to my neighbor, then I go hungry. It's a sum-zero game. That is, unless there's a smart soldier with a rock in a velvet bag visiting for the night.
Intellectual property disrupts our commonsense notions about the sum-zero game. When I give an idea to a neighbor we both can use it. But he can also use the idea in ways I would never have imagined, or make improvements to it that I either lack the time or skill to make. Rather than costing me, my donation pays me dividends. For this to work, though, my neighbor has to share the same spirit of openness that I do.
The villagers in the story didn't put every last bit of food they had into the pot, and eMolecules didn't donate their entire software infrastructure to Open Babel. There is clearly a place for maintaining a competitive advantage when running a business. What eMolecules did contribute was a small, but important ingredient to the soup they smelled cooking.
You know, I once used a cheminformatics toolkit with a 2-D layout engine, and it was fit for a king...
Disruptive Innovation in Scientific Publishing: Free Journal Management Systems
Like everything else in information technology, the costs of setting up and maintaining a scientific journal are rapidly approaching zero. A growing assortment of Open Source journal management systems is available today. Recently, I was introduced to one of these packages by Egon Willighagen as part of my involvement with CDK News.
Open Journal Systems
Open Journal Systems (OJS) automates the process of manuscript submission, peer review, editorial review, article release, and article indexing. All of these elements are, of course, cited as major costs by established publishers intent on maintaining their current business models.
OJS appears to work in much the same way as automated systems being run by major publishers. In fact, OJS is already in use by more than 800 journals written in ten languages worldwide.
Did I mention that OJS is free software - as in speech? The developers of OJS have licensed their work under the GPL, giving publishers the ability to control every aspect of how their journal management system operates. Standing out from the crowd will no doubt be an essential component of staying competitive in a world in which almost anyone can start their own journal.
Alternatives
And there's even better news: OJS has competition. Publishers can select from no fewer than seven open source journal management systems: DPubs; OpenACS; GAP; HyperJournal; SciX; Living Reviews ePubTk; and TOPAZ.
The Last Word
Open Source tools like Open Journal Systems have the potential to radically change the rules of the scientific publication game. By slashing the costs of both success and failure in scientific publication to almost zero, these systems are set to unleash an unprecedented wave of disruptive innovation - and not a moment too soon. What are the true costs of producing a quality Open Access scientific publication - and who pays? Will the idea of starting your own Open Access journal to address deficiencies with existing offerings catch on, especially in chemistry, chemical informatics, and computational chemistry? Before long, we will have answers to these questions.
Making the Case
The SMIREP system is available from http://www.karwath.org/systems/smirep/ under the GNU General Public License. The Web page also contains the data files used in the Experimental Section. The system is provided in Python and C source code, including the required Python OpenBabel module OBGrep.
-Andreas Karwath and Luc De Raedt, J. Chem. Inf. Model ASAP Articles
Karwath and De Raedt are onto something more than just an innovative use of SMILES strings. When the majority of chemical informatics papers provide instructions for downloading both complete source code and complete data sets, the game will have changed forever. Advocating this postion in essays, presentations, emails, and letters is one way to make the case, and a very old one at that. For your next paper, why not make the case with a statement like the one above?
Hacking PubChem: Free Speech or Free Beer?
Government information available from this site is within the public domain. Public domain information on the National Library of Medicine (NLM) Web pages may be freely distributed and copied. However, it is requested that in any subsequent use of this work, NLM be given appropriate acknowledgment.
This site also contains resources such as PubMed Central, Bookshelf, OMIM, and PubChem which incorporate material contributed or licensed by individuals, companies, or organizations that may be protected by U.S. and foreign copyright laws. All persons reproducing, redistributing, or making commercial use of this information are expected to adhere to the terms and conditions asserted by the copyright holder. Transmission or reproduction of protected items beyond that allowed by fair use (PDF) as defined in the copyright laws requires the written permission of the copyright owners.
Open Source licensing is nothing short of revolutionary. Of all of the things an Open Source license makes possible, perhaps the most far-reaching is the right of licensees to create and distribute derivative works. This is what separates "software that's free" ("free as in beer") from "Free Software" ("free as in speech"). A licensee that is not free to create and distribute derivative works has virtually no incentive to build on what the original creator has given away. Would you contribute your valuable time to improving something that you knew you could never use as you saw fit? This may sound like semantic hair-splitting, but it's far from it. None of the phenomenal progress made in Open Source software would have been possible without the basic rights to create and distribute derivative works.
PubChem's Copyright Disclaimer should give anyone familiar with Open Source licensing grounds to ponder. Apparently, NIH is telling its users that it doesn't have the authority to grant them the right to copy all PubChem content or distribute derivative works. But what parts of PubChem can these rights be granted for, if any? What parts of Pubchem are copyrighted, and therefore owned, by contributors? How can a user find out which parts of PubChem are subject to copyright claims by contributors?
It isn't too difficult to imagine a scenario in which PubChem requires those depositing data to agree to a copyright waiver. This waiver would simply grant PubChem users the sublicensable right to copy a depositor's content verbatim and to distribute derivative works based on it, royalty-free. The depositor would still retain any copyright they might want to assert outside of PubChem. If the depositor doesn't own these rights, or isn't willing to part with them, then that content would be rejected. This has been done for years in Open Source software projects and is being done increasingly with Creative Commons licenses for non-software intellectual property. Both approaches have strengths and weaknesses, and my aim is not to advocate either one. The point is simply that the idea is not new.
Maybe a copyright waiver isn't feasible. Regardless, PubChem could create a mechanism whereby content for which a contributor is asserting copyright claims can be identified as such and optionally avoided by its users.
While I'd never turn down free beer, and I'd always thank those offering, in the long run free speech is far more sustaining.
Taking a SWIG of InChI
The IUPAC InChI developer toolkit is written in C. It is currently the only Open Source software capable of generating InChI identifiers. Software that needs to write InChIs must use the C toolkit in one form or another. This poses a problem for the large amount of chemical informatics software being written in other languages. In this article, I'll explain how the Open Source tool SWIG can solve this problem in a semi-automated way. The same concepts can, in principle, be used to link any library written in C/C++ with another language.
Prerequisites
This tutorial uses Ruby as the language that InChI will be linked with. You'll therefore need both Ruby and the Ruby development libraries installed. You'll also need SWIG and possibly the SWIG development libraries.
Use the Source, Luke
After downloading and unpacking InChI-1-API v1.0.1, collect all header (*.h) and source (*.c) files into a directory called inchi. These files can be found in the following two directories:
- InChI-1-API/cInChI/common
- InChI-1-API/cInChI/main
Find the Main Method
This tutorial will create an interface into the InChI main() function. This function is found on line 149 of the file ichimain.c. For reasons I won't get into here, rename this method run and change the second argument type to char **. Also, add a prototype for the run function directly above line 149:
int run( int argc, char **argv ); // new line added
int run( int argc, char **argv ) // formerly line 149Create the Interface File
The focal point of SWIG is the interface file. This file specifies the C functions you want to link into and some items to help in doing so. Create a file called libinchi.i containing the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
The interface file has three main parts. The first part (line 2) names the module. The second part (lines 7-30) makes the necessary Ruby/C datatype conversions. The last part (line 35) tells SWIG the InChI functions we want to be able to access from Ruby.
Take a SWIG
At this point, SWIG has everything it needs to autogenerate our glue code. This can be done by:
$ swig -ruby libinchi.i
This command should have created a new source file, libinchi_wrap.c, that contains all of the C glue code for our library. We'll have a look at the most important part of this file shortly.
Create a Makefile
We'll need a makefile with which to compile our library. Fortunately, Ruby makes this very easy. Create a file called extconf.rb containing the following Ruby code:
require 'mkmf'
create_makefile('libinchi')$ ruby extconf.rb
Build the Library
Our library can now be built with:
$ make
Use InChI from Ruby
We are now done with the basics. You can verify that the process worked through Interactive Ruby (irb):
$ irb irb(main):001:0> require 'libinchi' => true
The return value of true shows that Ruby loaded and recognized the binary library we just built (libinchi.so). We are now able to use this library as if it were written in Ruby.
Use the Library
To test the library, copy a molfile called test.mol into your inchi directory. Now run this code:
require 'libinchi'
Libinchi.run(['', 'test.mol'])You should get a lot of output from the InChI libary. If you take a look at the inchi directory contents, a new file, test.mol.txt, has been created. It contains the InChI identifier of the molecule contained in your molfile. This software also created a log file (test.mol.log) and a problem file (test.mol.prb).
You may be wondering why the first element in the Array passed to Libinchi.run is empty. The reason is that by convention a C main method expects its first argument to be the name of the program itself. The InChI main method takes this into account, and so the Array simply leaves its first element blank.
Customize the Library
Have a look at the libinchi_wrap.c file that SWIG created. At the bottom of this file should be a function called Init_libinchi:
SWIGEXPORT(void) Init_libinchi(void) {
int i;
SWIG_InitRuntime();
mLibinchi = rb_define_module("Libinchi");
for (i = 0; swig_types_initial[i]; i++) {
swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
SWIG_define_class(swig_types[i]);
}
rb_define_module_function(mLibinchi, "run", _wrap_run, -1);
}This is what Ruby uses to map C functions to Ruby modules, classes, and methods. In this case, the C run method is being mapped to a module called Libinchi which has a run method.
Let's say that you'd prefer a module name of InChI with a method called write_inchi. The following changes to Init_libinchi will accomplish this:
SWIGEXPORT(void) Init_libinchi(void) {
int i;
SWIG_InitRuntime();
mLibinchi = rb_define_module("InChI");
for (i = 0; swig_types_initial[i]; i++) {
swig_types[i] = SWIG_TypeRegister(swig_types_initial[i]);
SWIG_define_class(swig_types[i]);
}
rb_define_module_function(mLibinchi, "write_inchi", _wrap_run, -1);
}Run make again. Now the following can be used to write the InChI information for test.mol:
require 'libinchi'
InChI.write_inchi(['', 'test.mol'])Summing Up
SWIG simplifies the job of connecting high-level languages like Ruby to C/C++ libraries. Although not illustrated in the simple example above, SWIG offers several advanced tools for creating rich library interfaces. Given the large amount of chemical informatics software written in C/C++, and the increasing interest by developers in scripting languages such as Ruby, the SWIG approach is likely to be broadly useful in several areas of chemical informatics integration.
The C InChI toolkit appears in a few other Open Source projects including Open Babel, the Chemistry Development Kit via the JNI InChI Wrapper, and Rino. To my knowledge, none use SWIG. This will soon change as the approach described here becomes incorporated into Rino.
On a more general note, the availability of the InChI source code under an Open Source license is essential to developing and distributing the kind of integration library discussed here. We can only hope that others working in chemical informatics see the wisdom in a system that creates healthy software ecosystems wherever it takes hold.


