Reading and Writing SD Files With MX
MDL Structure Data Files (SD Files) are the de facto standard for the exchange of chemical structures and associated data. As a result, methods for efficiently reading and writing these files play an important part in any cheminformatics toolkit.
The latest release of MX, the open source cheminformatics toolkit, adds support for reading and writing SD Files. Both source and platform-independent binary distributions are available.
The new release introduces SDFileReader
. In interactive JRuby:
jirb
irb(main):001:0> require 'mx-0.107.0.jar'
=> true
irb(main):002:0> import com.metamolecular.mx.io.mdl.SDFileReader
=> Java::ComMetamolecularMxIoMdl::SDFileReader
irb(main):003:0> r=SDFileReader.new 'pubchem_sample_33.sdf'
=> #<Java::ComMetamolecularMxIoMdl::SDFileReader:0x40b181 @java_object=com.metamolecular.mx.io.mdl.SDFileReader@145b02f>
irb(main):004:0> r.next_record
=> nil
irb(main):005:0> m=r.get_molecule
=> #<Java::ComMetamolecularMxModel::DefaultMolecule:0xcb754f @java_object=com.metamolecular.mx.model.DefaultMolecule@60b407>
irb(main):006:0> m.count_atoms
=> 31
irb(main):007:0> r.get_keys
=> #<Java::JavaUtil::ArrayList:0x381d92 @java_object=[PUBCHEM_COMPOUND_CID, PUBCHEM_COMPOUND_CANONICALIZED, PUBCHEM_CACTVS_COMPLEXITY, PUBCHEM_CACTVS_HBOND_ACCEPTOR, PUBCHEM_CACTVS_HBOND_DONOR, PUBCHEM_CACTVS_ROTATABLE_BOND, PUBCHEM_CACTVS_SUBSKEYS, PUBCHEM_IUPAC_OPENEYE_NAME, PUBCHEM_IUPAC_CAS_NAME, PUBCHEM_IUPAC_NAME, PUBCHEM_IUPAC_SYSTEMATIC_NAME, PUBCHEM_IUPAC_TRADITIONAL_NAME, PUBCHEM_NIST_INCHI, PUBCHEM_EXACT_MASS, PUBCHEM_MOLECULAR_FORMULA, PUBCHEM_MOLECULAR_WEIGHT, PUBCHEM_OPENEYE_CAN_SMILES, PUBCHEM_OPENEYE_ISO_SMILES, PUBCHEM_CACTVS_TPSA, PUBCHEM_MONOISOTOPIC_WEIGHT, PUBCHEM_TOTAL_CHARGE, PUBCHEM_HEAVY_ATOM_COUNT, PUBCHEM_ATOM_DEF_STEREO_COUNT, PUBCHEM_ATOM_UDEF_STEREO_COUNT, PUBCHEM_BOND_DEF_STEREO_COUNT, PUBCHEM_BOND_UDEF_STEREO_COUNT, PUBCHEM_ISOTOPIC_ATOM_COUNT, PUBCHEM_COMPONENT_COUNT, PUBCHEM_CACTVS_TAUTO_COUNT, PUBCHEM_BONDANNOTATIONS]>
irb(main):008:0> r.get_data 'PUBCHEM_COMPOUND_CID'
=> "1"
SDFileReader
implements lazy iteration with Molecules
and data only being created when requested.
SD Files can be written with SDFileWriter
. In interactive JRuby:
jirb
irb(main):001:0> require 'mx-0.107.0.jar'
=> true
irb(main):002:0> import com.metamolecular.mx.io.mdl.SDFileWriter
=> Java::ComMetamolecularMxIoMdl::SDFileWriter
irb(main):003:0> import com.metamolecular.mx.io.Molecules
=> Java::ComMetamolecularMxIo::Molecules
irb(main):004:0> w=SDFileWriter.new 'output.sdf'
=> #<Java::ComMetamolecularMxIoMdl::SDFileWriter:0x8a2023 @java_object=com.metamolecular.mx.io.mdl.SDFileWriter@43da1b>
irb(main):005:0> w.write_molecule Molecules.create_benzene
=> nil
irb(main):006:0> w.write_data 'key', 'value'
=> nil
irb(main):007:0> w.close
=> nil
For an up-to-date summary of MX's current capabilities, please check out the MX Homepage.