From SMILES to InChI with OBRuby

November 03, 2006

SMILES and InChI are two commonly-used molecular line notations. Although each has its advantages and limitations, the novelty of InChI and the ubiquity of SMILES makes the SMILES to InChI conversion especially useful. Many of the situations in which the need for this conversion will arise are particularly well-suited for the Ruby programming language. A recent article described how RCDK and Rino could be used to accomplish this conversion. This article will show how Open Babel can be used from Ruby to effect the same conversion.

OBRuby

OBRuby is a SWIG-generated Ruby interface to the Open Babel library. Although OBRuby doesn't expose all aspects of the Open Babel API, nearly everything that can be done in C++ Open Babel can now be done in Ruby. For example, all OBConversion permutations should be available, including SMILES to InChI.

A Small Ruby Library

Let's create a small Ruby library for converting SMILES strings into InChI identifiers. Save the following into a file called convert.rb:

require 'openbabel'

class Convertor
  def initialize
    @conv = OpenBabel::OBConversion.new

    @conv.set_in_and_out_formats('smi', 'inchi')
  end

  def get_inchi(smiles)
    mol = OpenBabel::OBMol.new

    @conv.read_string(mol, smiles)
    @conv.write_string(mol)
  end
end

There's nothing tricky here. We've simply created a Ruby class that makes the SMILES to InChI conversion as simple as one method call to an instance.

Testing the Library

A good way to test this library is through Interactive Ruby (irb). For example, to find the InChI of caffeine:

require 'convert'

c = Convertor.new

puts c.get_inchi('Cn1cnc2c1c(=O)n(C)c(=O)n2C') # caffeine
# =>InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3

Chiral SMILES

I applied this simple Ruby conversion library to the (S)-methamphetamine record in PubChem:

  • Isomeric SMILES: C[C@@H](CC1=CC=CC=C1)NC
  • PubChem InChI: InChI=1/C10H15N/c1-9(11-2)8-10-6-4-3-5-7-10/h3-7,9,11H,8H2,1-2H3/t9-/m0/s1

My results were:

  • Isomeric SMILES: C[C@@H](CC1=CC=CC=C1)NC
  • OBRuby InChI: InChI=1/C10H15N/c1-9(11-2)8-10-6-4-3-5-7-10/h3-7,9,11H,8H2,1-2H3/t9-/m1/s1

As you can see, there is a discrepancy in the two stereo layers ('m0' vs. 'm1'). The same InChI is generated by Open Babel using either OBRuby or the Worldwide Molecular Matrix. Substituting the SMILES string representing the opposite configuration at carbon generates the InChI with opposite configuration (R), which again is opposite to that of (R)-methamphetamine in PubChem.

At this point, it is unclear whether Open Babel or PubChem is producing the correct InChI for the methamphetamine enantiomers. I suspect Open Babel is correct. By creating a molfile of (S)-methamphetamine with JME and running cInChI over it, I got the same output as with the Open Babel conversions. I've found similar differences between PubChem and Open Babel InChIs in every chiral molecule I've looked at.

Conclusions

The conversion of SMILES, and other molecular languages, into InChI identifiers can be expected to become a recurring need as the popularity of InChI increases. Combining the formidable translation capabilities of Open Babel with the comfort and convenience of Ruby offers a powerful new technique for doing so.