Porting MX - CDK-Compatible VF Implementation

Substructure search is a fundamental cheminformatics operation, and an especially important component in chemical structure databases. Although a few algorithms for atom-by-atom comparison of two structures are available, one of the fastest is VF, which is implemented in MX, a lightweight cheminformatics toolkit.

A recent post discussed the limitations of directly porting the C++ implementation of VF into Java and why a Java-centric, de novo implementation was created for MX instead.

I'm now happy to report that Syed Asad Rahman of the European Bioinformatics Institute has created a preliminary implementation of VF for the Chemistry Development Kit (CDK) by porting the MX mapping package.

Looking through Asad's work, one of the most striking things is the isolation of CDK-specific code into a few key areas, a trait shared by the original MX implementation. Another is the readability of the code. Both features should greatly simplify further optimization work.

If you've been looking for a fast substructure search engine for your cheminformatics work, I recommend checking out both MX and the CDK port.