Six Reasons I Like reCAPTCHA, or How to Build a Web Service Worth Talking About 3

Posted by Rich Apodaca Tue, 18 Sep 2007 12:33:00 GMT

Having spent a great deal of time with reCAPTCHA over the last few weeks, I've come to appreciate both the clever idea and spot-on execution. Aside from being an excellent product, reCAPTCHA also offers clues to building Web services that people will not just use, but also tell their friends about. Here, in no particular order, are six things about reCAPTCHA that I hope all of my future Web services achieve:

  1. It solves a nasty, boring, and widespread problem elegantly. The nastier and more boring a problem is, the more likely that a solution that actually works will be praised by all who use it. Boring and difficult problems create a natural scarcity of good solutions and people willing to work on them. Seek these kinds of problems out; they are a high-probability path to success.

  2. It never goes down. Having used reCAPTCHA pretty much continuously over the last three weeks on my current project as well as on this blog, it's never been unavailable. As much as anything on the Web can be trusted to be there tomorrow, reCAPTCHA seems like a safe bet.

  3. The business model is obvious. reCAPTCHA "pays" for itself by helping to digitize old books. This is a valuable service in itself that could be monetized in some very interesting ways. Because reCAPTCHA does something valuable beyond just fighting spam, it's very likely that I can rely on the service being around for a long time.

  4. It solves problems from two different groups of users at the same time. Consider the giants of the Web: Yahoo, Google, YouTube, Facebook, eBay. What every one of them has in common is that along the way they have discovered how to solve problems from two different groups of users simultaneously. The same is true for reCAPTCHA. Come to think of it, every unsuccessful Web venture I can think of failed to solve two problems simultaneously (most didn't even really solve one). This shouldn't be surprising, but for some reason it is. Could this dual-problem thing be the golden rule of Web development?

  5. The Ruby Library Rocks. Although short on examples, the Ruby Library for reCAPTCHA does what it needs to do and gets out of my way. Of course, the actual language is unimportant. What matters is that solid libraries in popular programming languages exist. The number of people willing to experiment with a new Web service is low to begin with - forcing them to develop their library beforehand is pointless.

  6. No Limit on Usage. This is where a lot of Web services simply don't get it. I won't name names, but they know who they are. reCAPTCHA allows essentially unlimited use of their service - it's just part of the design. reCAPTCHA does make the reasonable request to "be contacted beforehand if you expect your site to constantly need more than 100,000 reCAPTCHAs solved per day." No access limit means developers can build scalable Web applications based on reCAPTCHA with confidence.

In the end, building a successful Web service that people will gladly tell their friends isn't complicated. It's just a matter of building value and trust.

One more thing. I'm currently using reCAPTCHA on this site in the comments section. There are probably still some browser-specific kinks to work out. If you're so inclined, I would be grateful for a short comment describing your experience, along with your browser and OS.

Hacking ChemSpider: Query by SMILES and InChI with Ruby

Posted by Rich Apodaca Mon, 17 Sep 2007 12:19:00 GMT

Slowly but surely, cheminformatics Web APIs are starting to appear. What's the big deal, you may ask? By exposing Web APIs, service providers enable third parties to develop new applications that "mash up" functionality from two or more sites, or which take the original service in directions its founders never considered.

By way of Antony Williams' blog, I came across the announcement for the ChemSpider Web API. What can this API do for Web developers? To find out, let's write a small Ruby library.

The Library

Our library will accept a SMILES string or InChI identifier and returns a URL pointing to the corresponding ChemSpider compound summary page. Like previous Web API demos, this one uses the powerful Ruby library Mechanize, leading to very concise code:

require 'rubygems'
require 'mechanize'

module ChemSpider
  def url_for_inchi inchi
    agent = WWW::Mechanize.new
    page= agent.get "http://www.chemspider.com/inchi.asmx/InChIToCSID?inchi=#{inchi}"
    csid = (Hpricot(page.body)/"string").innerHTML

    csid == "" ? nil : "http://www.chemspider.com/RecordView.aspx?id=#{csid}"
  end

  def url_for_smiles smiles
    agent = WWW::Mechanize.new
    page= agent.get "http://www.chemspider.com/inchi.asmx/SMILESToInChI?smiles=#{smiles}"
    inchi = (Hpricot(page.body)/"string").innerHTML

    raise "Invalid SMILES: #{smiles}" if inchi == ""

    url_for_inchi inchi
  end
end 

The url_for_inchi method directly uses the ChemSpider API to query by InChI. The url_for_smiles method first uses the ChemSpider API to convert a SMILES string to an InChI identifier, and then calls the url_for_inchi method.

Two points are worth noting. First, although for convenience the InChI identifier isn't escaped before being appended to the API URL, strictly speaking it should be. Second, both methods invoke the underlying Mechanize library Hpricot to parse the raw XML returned by ChemSpider.

Testing

Saving the above code to a file called chemspider.rb, we can get the URL to ChemSpider's benzene page from its InChI identifier via interactive Ruby (irb):

$ irb
irb(main):001:0> require 'chemspider'
=> true
irb(main):002:0> include ChemSpider
=> Object
irb(main):003:0> url_for_inchi "InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H"
=> "http://www.chemspider.com/RecordView.aspx?id=236"

We can work with SMILES strings just as easily as with InChIs:

$ irb
irb(main):001:0> require 'chemspider'
=> true
irb(main):002:0> include ChemSpider
=> Object
irb(main):003:0> url_for_smiles 'c1ccccc1'
=> "http://www.chemspider.com/RecordView.aspx?id=236"

Both the InChI and the SMILES string yield a URL pointing to the same Chemspider page for benzene.

Conclusions

Like most chemical databases, ChemSpider uses a compound summary page as a way of organizing the available resources for a given molecule. With a method in hand for accessing these pages based on arbitrary SMILES or InChIs, we can begin to think of manipulating ChemSpider independently of its current user interface. But that's a story for another time.

Mashups for Fun and Profit

Posted by Rich Apodaca Sat, 23 Sep 2006 20:27:00 GMT

ProgrammableWeb offers one-stop shopping for all things mashup-related. If you've ever wanted to try your hand at Web programming, this site makes an excellent first stop. Be sure to check out the listing of over 1,000 mashup sites indexed by category and API.

The move toward open, Web-based chemical information resources is fully underway. The genie has been let out of the bottle, and there's no putting him back. This is bad news for large, established chemical information players. Their business models based on restricting information flow will be irreversibly disrupted. It's good news for tens of thousands of researchers who will be able to exploit chemical information in ways unimaginable today. Leading the way will be mashups that creatively tie diverse Web resources together, and dynamic programming languages like Ruby that make doing so easy.

Are you ready for the future?