Pablo Picasso once said that “computers are useless: they can only give you answers”. However, biologists routinely discard interesting research ideas just because they do not know how to pose them to computers.
This is mainly because very few life scientists receive adequate training in scientific computing. Graduate students often take programming classes, some are familiar with databases, but all too often whatever a recent biology PhD knows about computing has been acquired through painful trial and error, and long sessions with the one graduate student in the program who is “good with computers”.
To overcome these problems, I developed a new class called “Introduction to Scientific Computing for Biologists”, whose aim is to showcase the 10 things every biologist should know about computing.
Here’s my (biased) list of the top 10 tools:
- Unix/Linux — an operating system written by programmers for programmers. It provides legions of small programs that can be concatenated creating complex flow for your data.
- Version control — absolutely essential to keep everything tidily organized, and to help collaboration (in the notes, I use git).
- Basic programming — we are scientists attempting to do something new. Then–by definition!–there isn’t a program that does exactly what we want (in the notes, I use python).
- Regular expressions — can save you much time and simplify the use of text data (python again).
- Statistical computing — to have access to many packages for data analysis (R).
- Automated plotting — because you have to draw each figure many, many times (ggplot2).
- LaTeX — ideally suited for large, complex documents such as a PhD Dissertation.
- Databases — to keep all of the data (and the metadata!) organized and easily accessible (for a quick introduction, I use SQLite)
- bash scripting — a necessary evil.
- Cluster computing — to speed up computing.