Fighting Spam on the Cheap with CAPTCHA - A Simple Ruby Library for captchas.net

September 03, 2007

A recent article discussed two free services for cheaply integrating CAPTCHAs into Web applications. One of these services, captchas.net, apparently has no publicly-available Ruby library. Given the popularity of Ruby on Rails for building Web applications, and the increasing need for spam protection offered by services such as captchas.net, it seems only logical that such a library should exist. This article, the first in a series, documents the first step in the development of a simple Rails library for working with captchas.net.

Got Key?

To use captchas.net, you'll need to register for an account. You'll receive a secret key used in decoding the text represented in the CAPTCHA, and a username that will be encoded with your capthcas.net URLs.

Building a URL

To get your CAPTCHA image from captchas.net, construct a URL containing the appropriate parameters. The simplest form of a captchas.net URL accepts your user name ('demo') and a random phrase ('my_random_text'), and returns a complete CAPTCHA image. Customization is possible, but we'll just stick with the simple case for now. As an example, this URL:

http://image.captchas.net/?client=demo&random=my_random_text

generates this image:

The URL above is all you need to embed a CAPTCHA into your webpage. The random text we've encoded in the URL ('my_random_text') is processed by the captchas.net server to create the six-character sequence shown in the image. Read on to find out how.

Decoding the CAPTCHA

We've got a CAPTCHA, but how do we know what's written in it? This is where our secret key comes in. Here's the method used by the captchas.net server to generate the image text:

  • concatenate the secret key and the random string (example: 'secret' and 'my_random_text' become 'secretmy_random_text')
  • if alphabet or character_count differs from 'abcdefghijklmnopqrstuvwxyz' and 6, respectively, append both separated by ':' (<secret><random>:<alphabet>:<character_count>).
  • take the MD5-sum of the resulting string
  • take the first character_count bytes of the resulting 16-byte-long MD5 value
  • determine the remainders of this character_count bytes, when dividing by the length of alphabet
  • every number encodes a character from the chosen alphabet (example: "hnrppb")

The captchas.net site has a more complete description of the algorithm and an interactive CAPTCHA generator that is very helpful in understanding how CAPTCHAs are generated.

A Simple Library

Given the algorithm, a short Ruby library can be written to find the text encoded in a captchas.net CAPTCHA:

require 'digest/md5'

module Captcha
  def get_text secret, random, alphabet='abcdefghijklmnopqrstuvwxyz', character_count = 6
    if character_count < 1 || character_count > 16
      raise "Character count of #{character_count} is outside the range of 1-16"
    end

    input = "#{secret}#{random}"

    if alphabet != 'abcdefghijklmnopqrstuvwxyz' || character_count != 6
      input <<  ":#{alphabet}:#{character_count}"
    end

    bytes = Digest::MD5.hexdigest(input).slice(0..(2*character_count - 1)).scan(/../)
    text = ''

    bytes.each do |byte|
      text << alphabet[byte.hex % alphabet.size].chr
    end

    text
  end
end 

We can test the library using irb:

$ irb
irb(main):001:0> require 'captcha'
=> true
irb(main):002:0> include Captcha
=> Object
irb(main):003:0> get_text 'secret', 'my_random_text'
=> "hnrppb"

If we wanted to include numerical digits and require additional characters, the library enables this as well:

$ irb
irb(main):001:0> require 'captcha'
=> true
irb(main):002:0> include Captcha
=> Object
irb(main):003:0> get_text 'secret', 'my_random_text', 'abcdefghijklmnopqrstuvwxyz0123456789', 7
=> "62m3acs"

Conclusions

That's really all there is to the Ruby library. Once we can create a CAPTCHA image and decode its contents, we can begin to think about building an integrated Rails solution. But that's a story for another time.

image credit: Andrew Huff