Five Reasons Why Chemical Societies Need Free Databases and Web Services 2

Posted by Rich Apodaca Fri, 15 May 2009 17:38:00 GMT

For those who may not have seen the news, the Royal Society of Chemistry (RSC) earlier this week announced the acquisition of ChemSpider, the free database of chemical structures and related data. The tone of the press releases and commentary around the Web has been congratulatory, which is to be expected given the dedication and hard work by ChemSpider's creators. And much of the discussion focuses on what the chemistry community gains by the move. But there's much more to the story.

What's in it for RSC?

What's lacking in the public discussion is a clear explanation of what one of chemistry's oldest institutions hopes to gain by acquiring one of its newest.

Times are tough all over, and the scientific publishing business is no exception. This year the American Chemical Society (ACS) announced cuts to its staff and employee benefits programs amid declining revenues and investment returns, a situation unlikely to reverse itself anytime soon.

Although a service like ChemSpider can be created very inexpensively, growth and maintenance will likely require significant resource commitment. Neither the RSC nor ChemSpider offer any indication of how the service will break even, much less contribute to RSC's bottom line.

The Big Picture

Chemical societies around the world are likely to be quite interested in what happens from here.

In years past, paid database and journal subscriptions laid the foundation for many of the activities supported by the largest chemical societies. But the paid subscription model sits in the crossfire of several long-term trends, most notably price increases that habitually outpace the rate of inflation, severe budget cuts in both academia and industry, and the emergence of dozens of free chemistry databases, Web services, and other communication channels beyond ChemSpider.

What's in it for You?

If you work for or are otherwise involved with a chemical society, what does the creation or acquisition of a free Web service like Chemspider do for you? Here, in no particular order, are some possibilities:

  1. Consult your Mission Statement. The RSC is dedicated to the "advancement of chemistry as a science, the dissemination of chemical knowledge, and the development of chemical applications." Many societies share similar statements of purpose. Free Web services represent one of the most cost-effective ways to achieve this goal.

  2. Nontraditional revenue sources. No, we're not talking about advertising, although that's a possibility. Just because a Web service is "free" doesn't mean that all of its services need to be. For but one example, consider that many in industry are concerned about the information revealed by company employees' queries on public Web services. There are many ways to address these concerns - and create revenue in the process. With even a small amount of creativity, many more opportunities like this can likely be found.

  3. Increased visibility for your other products and services. Google does it. IBM does it. Hundreds of smaller companies you may never have heard of do it. They've all built permission assets as a way to more effectively communicate their message to people who matter to them. Chemists will routinely ignore (and even scorn) your advertisements. How likely are they to ignore a free Web service that solves their problem?

  4. Increase the reach and cohesion of your community. ChemSpider is one of the few public-facing chemistry databases that accept community-created information. Users of a system who only consume information have little stake in it. Users who contribute tend to be much more involved in the process, and the organization behind it.

  5. Winner takes all. Quick - what's the second most popular search engine. What's the second most popular online encyclopedia? What's the second most popular video sharing site? What's the second most popular microblogging service? What's the second most popular photo sharing site? You've heard of all of the front runners, even if don't use them. Have you even heard of any of the also-rans? When it comes to free online resources, winner takes all. By avoiding the creation of free online resources, you run the real risk of rendering your chemical society irrelevant.

Conclusions

The Web is in the process of changing the operating rules for every organization, particularly in information-rich technical fields like chemistry. If your chemical society ignores the changes now underway, then what exactly is its plan for staying relevant?

Five Questions About the InChI Resolver 16

Posted by Rich Apodaca Tue, 02 Dec 2008 17:14:00 GMT

Yesterday the Royal Society of Chemistry (RSC) and ChemZoo (of ChemSpider fame) announced a plan to collaborate on the creation of an InChI Resolver service. From the announcement:

Using the InChI - an IUPAC standard identifier for compounds - scientists can share and contribute their own molecular data and search millions of others from many web sources. The RSC/ChemSpider InChI Resolver will give researchers the tools to create standard InChI data for their own compounds, create and use search engine-friendly InChIKeys to search for compounds, and deposit their data for others to use in the future.

...

The InChI Resolver will be based on ChemSpider's existing database of over 21 million chemical compounds and will provide the first stable environment to promote the use and sharing of compound data. 'ChemSpider hosts the largest and most diverse online database of chemical structures sourced from over 150 different data sources' adds Antony Williams of ChemSpider, 'We have embraced the InChI identifier as a key component of our platform and the basis of our structure searches and integration path to a number of other resources. We have delivered a number of InChI-based web services and, with the introduction of the InChI Resolver, we hope to continue to expand the utility and value of both InChI and the ChemSpider service.'

It's encouraging to see a major scientific publisher lend its support to InChI in further evidence of the broad adoption of the identifier. And an InChI key resolver is something I've previously said might be a good idea.

Still, InChI and InChI Key represent a significant change in platform for the field of chemistry, in which CAS Registry Numbers are the gold standard for chemical identification.

If we've learned anything from the last 30 years of information technology, it's that once a platform (no matter how dysfunctional) becomes entrenched, nothing short of a game-changing strategy and herculean effort can replace it. The failure of Windows Vista offers a stark reminder of the power of an entrenched platform. Closer to home, the failure of V3000 molfiles to gain significant traction against V2000 offers another.

With these thoughts in mind, here are some questions about the new InChI Resolver service:

  1. What problem is the service really trying to solve? Although it might be obvious to those close to the situation, it's not quite clear to me. Many, if not most, of the desktop cheminformatics packages sold today now have support for generating InChIs. It's also possible to embed InChI in text documents without using a Web service. Convenient it's not, which may be the point. But if that's the case then the focus of the service should be convenience, simplicity, and ease of use.

  2. How hard would it be to crack an InChI hash? Before dismissing this as impossible, consider that an InChI key is a form of encryption, and a weak one at that. Breaking encryption schemes has a long history in computer science. Given the regularity of InChI syntax, how hard would it be to create software that can computationally provide the InChI that was used to generate an InChI key? What alternative hashing method might make it easier to do so? If there is one, it would become the standard, not the one currently being used.

  3. How will the authenticity of a hashed InChI from an untrusted source be verified? An InChI key might take the form of 'AAAAAAAAAAA-BBBBBBB-XYZ'. Given an arbitrary InChI key provided by an untrusted third party, how would you independently verify that it actually represents a valid key? In the absense of software like that described in Question 2, it would be impossible.

  4. What about BINOLs and Ferrocenes? InChI can't distinguish between stereoisomers arising from axial chirality such as that found in widely-used molecules such as BINOL. There are multiple ways to represent organometallics such as ferrocene using InChI, and each will give rise to a unique InChI key. This is a Bad Thing.

  5. Why bother with an InChI key at all? Consider a hypothetical InChI key: 'AAAAAAAAAAA-BBBBBBB-XYZ'. To an end user uninterested in information technology, why does it matter how the key was generated? One selling point might be that given an arbitrary key, the chemical structure it represents can be decoded independently of any service. But that service is the core of the RSC/ChemSpider proposal - and it will apparently only be able to resolve previously-deposited InchI keys. Sound familiar? This is essentially how the CAS Registry system works, except the CAS system can differentiate BINOL stereoisomers, uniquely identify organometallics, and even handle polymers and complex mixtures.

Within the RSC/ChemZoo proposal is a gem of an idea. The CAS Registry system is closed and in all likelihood will remain forever so. Verifying the authenticity of CAS number/chemical structure assignments is a big problem made worse by the closed nature of the CAS Registry system. Chemists must have a reliable method to reference chemical structures. There are no doubt many solutions to this problem with big payoffs to the field of chemistry for the one that actually works.