Roll Your Own Chemical Database With Free Components 5

Posted by Rich Apodaca Fri, 13 Apr 2007 10:27:00 GMT

Are you thinking of building a free chemical database but would rather not rent and maintain a bunch of proprietary software components? Norbert Haider has thought a lot about this problem and offers some helpful resources to get you started:

Haider's system can be deployed on commodity hardware running open source operating systems. In other words, the cost of setting up a system like the one he describes is practically zero.

Creating and open sourcing your own custom components is one way to go. Building on top of existing open source tools like CDK, Open Babel, Octet and JOELib is another.

Haider's work raises an interesting question. Has anyone assembled a complete, ready to install general purpose chemical database package built from open source components? It for no other reason, such an exercise would give an excellent idea of what the dogfood tastes like.

Comments

Leave a response

  1. Egon Willighagen Sat, 14 Apr 2007 02:52:07 GMT

    Stefan build the NMRShiftDB software, which in its core is a molecular compound database system. You can download it from the corresponding SourceForge webpage. The spectrum part is just an extra.

  2. Rich Apodaca Sat, 14 Apr 2007 10:47:45 GMT

    I was thinking of NMRShiftDB more as a specific kind of chemical database. Still, it would be interesting to see how it works as a general-purpose chemical database.

  3. Egon Willighagen Sat, 14 Apr 2007 12:48:17 GMT

    Oh, forgot about AMBIT: check that out. Less spectrum, more molecule:

    http://ambit.acad.bg/

  4. Rajarshi Guha Sat, 14 Apr 2007 14:50:46 GMT

    You could also consider chemical backends for databases. Two OSS examples include Tigress (for PostgreSQL) and SMDC (for MySQL). Both use Openbabel internally, though Tigress also makes use of checkmol/matchmol.

    I like these approaches though I haven't used these specific packages myself (we're using gNova which follows the same idea).

    But the fact that the DB now has cheminformatics functionality makes writing applications based on DB backends much easier and a lot of our projects are based on this approach.

    Coupled with some simple scripts to say load SDF's into a DB, evaluate some properties and load them in etc, it's pretty easy to make a LAMP'esque application. Obviously DB's are general purpose tools, so in some cases we need to get our hands dirty (say indexing schemes for 3D search).

    But I can envisage a small, but relatively general distribution starts at SDF's and ends at a simply to use web page for queries

  5. Geoff Mon, 16 Apr 2007 15:28:10 GMT

    For small applications, you probably don't even need the database. Mac, Windows, and Linux all have mechanisms to index chemistry data -- I showed this with ChemSpotlight. Leave the data files in place and use something like Rails to allow a query interface and return the results.

    (Hmm, sounds like I need to add another section before submitting the ChemSpotlight paper. Rich, if you're interested, let me know.)