Which Programming Language Should I Learn as a Chemist?

June 12, 2012

Reddit Chemistry hosts an interesting discussion about which programming language is most useful to learn as a chemist. This is an important question as chemists everywhere come face-to-face with the worst job market on record. Combining a good chemistry background with a useful skill such as computer programming would be one strategy for staying prepared for whatever lies ahead.

Having made the transition from industrial bench chemist to chemistry software developer has given me many ideas on this topic. If you've even remotely considered dabbling in software as a chemist, what follows may be helpful, but as always YMMV.

Forget Languages, Find a Good Problem Instead

Identifying good problems - problems that when solved will yield profitable outcomes - is one of the most difficult and valuable things we do in our careers as scientists. This is equally true in software.

Finding a good problem matters far more than finding a programming language to learn. I'd even go so far as to say don't even bother learning a programming language until you've identified a good problem to solve.

For one thing, having a good problem in mind will narrow down language choices. Some programming languages are demonstrably better at solving certain problems than others. For example, if you want your software to run inside any Web browser without plugins, your only choice is JavaScript. Likewise, if you want to distribute your software on the Apple App Store, some Objective-C is mandatory. To write small pieces of software that automate Excel, you'll need to learn Visual Basic.

Second, any programming language worth learning for the long term will come with a bewildering assortment of platforms, libraries, techniques, and documentation. For a beginner, information overload will be a major barrier. Having a specific problem in mind will enable you to make better choices about what to learn and what to ignore.

Finally, you can only learn so much from a lecture or book. You must practice getting yourself into trouble and back out - repeatedly. The chemical problem you identify will offer a rich supply of small goals you can set for yourself as you learn new material. As you solve these mini-problems, you'll be building up code and know-how that will be directly applicable to solving your larger problem. Little will be wasted, but be prepared to discover that the language you chose comes with significant limitations.

What makes a good problem? That depends on you. Whether you're out to make money, speed up scientific progress, or help students learn, a key characteristic of your first problem will be scale. The ideal problem lies within the intersection of problems you care about, problems that are important, and problems you can realistically solve as a first programming project.

Depending on your experience level, you may not make the right choice initially. The only way to know is to try. If it starts to look like another language may be a better fit for your problem, or that another problem may be more appropriate for your language - switch. But be aware of the time-wasting effects of analysis paralysis. Remember, your goal is to deliver a solution to a specific problem, not to learn a programming language.

How (and Why) I Learn New Programming Languages

Over the last fourteen years I've become very comfortable with four languages: Java; JavaScript; Ruby; and Objective-C. In each case I was driven by a specific problem I wanted to solve and the belief that the language I learned was best-suited for the job.

  • Java: During my postdoc, our lab had an IR spectrometer from which I was regularly gathering spectra. Although the hardware itself was first-rate, the software was terrible. I wanted a better tool for viewing and processing spectral data. I also wanted my program to run on any operating system without rewriting it. Java was the best choice at the time. During my first few months, I focussed on just two goals: (1) learning how to plot 2D data; and (2) learning how to reverse-engineer the proprietary, undocumented, binary file format the spectra were stored in. Java was my first exposure to programming since Basic and 6502 Assembly Language some time before. Although I used Java for many years (and continue to do so), there are only two books I found worth owning: Java in a Nutshell and Java Swing.
  • Ruby: When I started my company in 2007, very little chemistry software was being written as Web applications - most of it was the same desktop software that had been produced for 15 years or more. I wanted to change that. I had the germ of an idea that eventually evolved into Chempedia. Building Web applications is very different from building desktop software. At the time, Ruby on Rails was a very small project run by some very smart people; even then it was clear that Rails offered many advantages over the way Web apps had been developed previously. But it came with a price - learning an odd-looking programming language little-used outside of Japan. I spent most of my first few months learning Rails and Ruby simultaneously with the help of two books, The Pickaxe Book and Agile Web Development with Rails. My first few mini-projects included small chemical databases that let me test substructure and exact structure searching through a browser interface.
  • JavaScript: Several months after starting my company, I was selling an embeddable chemical structure editor written in Java. Although customers liked the software, they were also annoyed by problems a small but vocal percentage of their users were having in getting Java running properly in their browsers. Could we offer a structure editor that didn't use browser plugins? At the time, I speculated that JavaScript could be very useful for a number of chemistry problems, although this was by no means certain. One thing led to another and eventually we released a plugin-free structure editor. I spent the first few months of my deep-dive into JavaScript understanding its graphics capabilities and performance characteristics with the help of three books, JavaScript: The Good Parts and Google Closure: The Definitive Guide, and SVG Essentials. Having a background in Java accelerated the process.
  • Objective C: Organic synthesis is a very difficult subject to master due to its breadth and depth. Books are still the main medium through which organic synthesis is taught and learned. Could smartphones and tablets offer a better way? To answer this question, I teamed up with James Ashenhurst to build Reagents. Our first version was written mainly in JavaScript, a language that I had become very familiar with. Unfortunately, users didn't think this was such a good idea, complaining about 'jerky' animation, UI elements that didn't behave as expected, and crashes that we could never reproduce, among other technical issues. I made the decision to re-write the entire app in Objective-C to eliminate these problems. In contrast my previous three languages, I learned Objective-C not from books but by following Simon Allardice's excellent video courses, Objective-C Essential Training and iOS SDK Essential Training, on Lynda.com. Holes were filled with Apple's excellent online documentation. My focus on building Reagents helped greatly in navigating the large volume of information available on iOS development that seems to change significantly every six months.

The pattern is pretty clear: I never learn a programming language unless it's the best way to solve a particular problem. Strangely enough, my selection of languages has always had more to do with the platform on which the language performed the best and much less to do with the design of the language, the popularity of the language among the cheminformatics crowd, or other often-discussed considerations.

The bad news: no one language is best-suited for writing chemistry software. The good news: learning one language makes it much easier to learn the next.

What To Do While Learning a Programming Language

You've picked a problem and the language best-suited to solving it. What now?

If your experience is anything like mine, you'll start hitting problems almost immediately. Ideally, you'd have experts you could go to with questions, but in reality this may not be the case. Here are some ideas for things to try when you get stuck:

  • Search StackOverflow. If you've never seen it before, you need to become familiar with StackOverflow. Have you tried Google to no avail? Bypass the middle man and search StackOverflow directly. I've been pleasantly surprised at how useful this simple technique can be.
  • Post a Question to StackOverflow. If you've never posted a question to a public forum before and are uncomfortable about doing so - too bad. You'll have to get over this phobia and just do it. Use a pseudonym if you must. Provided you give it some thought, you won't be disappointed. I've found many times that the simple act of gathering my thoughts to write a question gave the insight I needed to find the solution.
  • Contribute to an Open Source Project. As a developer, you'll need to not only write code, but be good at reading other people's code. Open source is a great way to do this. Working on an open source project is also the single best way I've found to build a network of smart developers you can learn from. A paper on The Blue Obelisk offers a good compilation of open source projects in chemistry.
  • Start Blogging. In contrast to chemists, software developers rarely write academic papers. Those who do write regularly tend to use blogs instead. Adopting the habit of writing regularly about your development experiences for a public audience will go far toward helping you solve problems and get feedback, building a professional network of developers, and eventually promoting your work.
  • Publish Your Code on GitHub. You're going to need a source code management tool if you plan on doing significant software development and Git is a good choice. Using Git comes with the benefit that publishing your work on GitHub will be easy. Doing so will enable you to let others in on your project. You may be surprised at what this leads to.

Conclusions

If you're a chemist who wants to learn programming, picking a language will be the least if your work. Your biggest challenge will be to find the right problem to solve. Having identified a problem and a language to use, you'll still face changes as you learn the language and become familiar with its limitations. If you manage to persevere to the point of actually producing a solution to your problem, you'll face a further challenge letting others know about it.

Make no mistake - it's a ton of work and a large time commitment. The only way you'll make it is by being more than a little obsessed about solving a specific chemistry problem. Take some time to find one that's right for you and don't be afraid to iterate a little before settling.