Rethinking the Command Line for Chemistry 1

Posted by Rich Apodaca Tue, 27 Mar 2007 16:30:00 GMT



A recent article discussed the renaissance of the command line. Particularly on the Web, command line interfaces have become so advanced, that most of us don't even realize we're using them. Consider the Google search box, which is nothing more than one of the most powerful command line interfaces ever developed.

A service called YubNub takes this idea one step further. YubNub is a meta command line interface for the Web. The following YubNub command will do a Flickr search for benzene.

If this were all YubNub did, it would be merely interesting. What makes YubNub remarkable is that you can create your own commands that other people can use. I recently added the "ginchi" command to query Google for an InChI. Now you can try it out:

By itself this isn't particularly useful because you can just go to Google and query the InChI directly. However, it's not too hard to imagine several commands like ginchi that could be added. Some would use Google, others would use other services. How about something that searches Mitch Garcia's chemistry journal Yahoo pipe? It would be very convenient to have all of those commands accessible from the same Web page.

Command line interfaces can be phenomenally useful for both beginning and advanced users. The hardest part to get right is not what the user sees as they type, but what happens after they hit the enter key.

Line notations are the perfect match for command line interfaces. The widespread use of SMILES and the precision of InChI offer many possibilities for innovative chemistry Web services.

Customize InChI Output with Rino

Posted by Rich Apodaca Mon, 19 Mar 2007 14:30:00 GMT

Rino is a toolkit for working with the IUPAC International Chemical Identifier (InChI) in Ruby. Because it's based on the IUPAC/NIST InChI toolkit, Rino can be configured using a variety of useful options. This article summarizes those options and provides an illustrative example.

Complete List of InChI Command Line Options

The following is a complete summary of the IUPAC/NIST InChI toolkit command line options:

  • SNon Exclude stereo (Default: Include Absolute stereo)

  • SRel Relative stereo

  • SRac Racemic stereo

  • SUCF Use Chiral Flag: On means Absolute stereo, Off - Relative

  • SUU Include omitted unknown/undefined stereo

  • NEWPS Narrow end of wedge points to stereocenter (default: both)

  • SPXYZ Include Phosphines Stereochemistry

  • SAsXYZ Include Arsines Stereochemistry

  • RecMet Include reconnected metals results

  • FixedH Mobile H Perception Off (Default: On)

  • AuxNone Omit auxiliary information (default: Include)

  • NoADP Disable Aggressive Deprotonation (for testing only)

  • Compress Compressed output

  • DoNotAddH Don't add H according to usual valences: all H are explicit

  • Wnumber Set time-out per structure in seconds; W0 means unlimited

  • SDF:DataHeader Read from the input SDfile the ID under this DataHeader

  • NoLabels Omit structure number, DataHeader and ID from InChI output

  • Tabbed Separate structure number, InChI, and AuxIndo with tabs

  • OutputSDF Convert InChI created with default aux. info to SDfile

  • InChI2InChI Convert InChI string into InChI string for validation purposes

  • SdfAtomsDT Output Hydrogen Isotopes to SDfile as Atoms D and T

  • STDIO Use standard input/output streams

  • FB (or FixSp3Bug) Fix bug leading to missing or undefined sp3 parity

  • WarnOnEmptyStructure Warn and produce empty InChI for empty structure

A Test

The following code displays the InChI for benzoic acid with and without mobile hydrogen atom perception. It requires both Rino and Ruby CDK. The latter library is used to convert a SMILES string into a molfile for use by Rino.

require 'rubygems'
require_gem 'rcdk'
require_gem 'rino'
require 'rcdk/util'

molfile=RCDK::Util::Lang.smiles_to_molfile 'c1ccccc1C(=O)O' # benzoic acid
reader = Rino::MolfileReader.new
inchi = reader.read(molfile)

puts "Without mobile hydrogen perception:\n#{inchi}\n\n"

reader.options << '-FixedH'
inchi = reader.read(molfile)

puts "With mobile hydrogen perception:\n#{inchi}"

The -FixedH flag used by the reader the second time tells Rino to identify mobile hydrogens in the InChI output. Some InChI authors use this form of InChI and others don't. PubChem is an example of a large InChI author that does use mobile hydrogen perception, as their entry for benzoic acid demonstrates. To perform an exact match of your InChIs with theirs, the -FixedH flag must be set.

Running the Test

Running the test code produces the following output:

Without mobile hydrogen perception:
InChI=1/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)

With mobile hydrogen perception:
InChI=1/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)/f/h8H

Conclusions

When matching InChIs generated by other authors, it's best to adopt their processing conventions. Rino makes it conventient to do so through its full support for the standard IUPAC/NIST command line options.

Do You Use the Command Line? 2

Posted by Rich Apodaca Thu, 15 Mar 2007 15:18:00 GMT

In the run to abandon command line interfaces for the GUI, we've left behind the versatility of language.

...

[Imagine] using a drop-down menu to select the one web site you want to go to out of the 100 million web sites in existence. Ludicrous! How do we actually surf to a site? By typing an address into the address bar. When we want to go to the mail "application", we type in "gmail.com"; when we want to open a news "application", we type in "nytimes.com". On the old unix command lines, we would type type "pine" and "rn". See a similarity? The address bar is just a primitive command line. A command line that your grandmother can—and does—use.

-Aza Raskin, Get Humanized

The command line is alive and well. It's simply become so sophisticated that most of us don't realize we're using it. Whether we're entering a URL into a browser address bar, taking advantage of autocomplete to look up a co-worker's name in an address book, or using Google to search the Web, the command line is hard at work. Most people wouldn't want it any other way.

To an end user, a command line is nothing more than a box to enter text. The magic happens when this text is processed. Aza Raskin's company Humanized uses this simple idea to build text-driven applications that save time and effort.

What would happen if the same thinking were applied to chemical informatics?

Image credit: Bartholomule - Flickr