Living the dream, robots running biological experiments in silicon valley AKA my first go on Transcriptic

14 Apr 2015

So I thought I would write up my first experience using Transcriptic.

DH5alpha Growth Curve

To test the platform I just wanted to perform a simple growth curve of E. coli DH5alpha. This is pretty simple to perform on Transcriptic because there is already a core protocol defined by the autoprotocol standard that they created.

Set up

The first step was to acquire some bacteria, this was super easy as Transcriptic provide some core molecular bio reagents. One of these is competent cells so I grabbed an aliquot of DH5alpha because I have used it before during my PhD. The aliquot cost $5.05 for 50µL containing 1 unit and DH5a vendor was Zymo Research.

Inventory screenshot

Next I had to create the run. I used the ‘core’ protocol for performing growth curves which is very easy to get started with as you just enter parameters into the fields as can be seen in the screenshot.

New run screenshot

The main parameters were: 5µL of the DH5alpha aliquot in each of the 3 replicates, 1 negative control with no bacteria. OD600 measurements taken every 30mins for a total time of 12 hours. From these parameters the protocol generates all of the run commands that include dispensing of LB into the 96-well plate and all of the incubation and plate reader acquisition steps. After all the steps have been generated Transcriptic also give you the cost of the run which was $7.00, which is totally reasonable in my opinion.

This was so easy to set up, and I will take a look at writing my own protocols at some point in the future using Autoprotocol.

Run progress

I found the whole tracking the run very exciting, picturing these robots over in California executing this experiment, it fills you with excitement about the potentials of this technology. I told my flat mate “this was the dream” but he said “most people probably don’t dream about executing biology experiments on robots in silicon valley”…

Transcriptic have done a really nice job of showing the steps involved in a run and indicating the progress through that run. You can also preview observations made to give you an idea of how well the experiment is going.

Run progress screenshot

At Dentally we make heavy use of dashboards built on Dashing to track statistics about our web application and business related stuff like our sales pipeline and our support stats. With this in mind I was picturing a future company with multiple simultaneous runs on Transcriptic and what their dashboard would look like. To drive these dashboards you need an API to access the real time data and thankfully Transcriptic do have an API. I just wanted to ping the API to see if I could get the status of my run, an early step in building a realtime dashboard.

The endpoint takes this form: https://secure.transcriptic.com/:organization/:project/runs/:run_id and it takes headers of the user email and access token. The API should return a 200 and some JSON containing details of the run status, costs, who created it and other metadata. Unfortunately I was just getting back HTML from my requests and not the expected JSON. This may be due to me only having a test API key, though I am unsure of the limitations of just being on the test level. I left a support ticket so I should find out if I was being dumb shortly!

Data processing and visualisation

Once the run had completed I retrieved the data from Transcriptic. You can download the data as a .zip. Frustratingly, though I can understand why, the dataset was a collection of 24 .csv files one for each acquisition from the plate reader.

So to aggregate the whole data set I did a few things.

I first added column names to all 24 .csv files in bash using sed:

find . -maxdepth 1 -type f -exec sed -i.bk '1i \
well,abs
' {} \;

This script prepended every file with the column names and backed up the original files.

Then using csvkit I stacked all the files together:

csvstack -g 0.5000000000,1.0000000000,1.5000000000,2.0000000000,2.5000000000,3.0000000000,3.5000000000,4.0000000000,4.5000000000,5.0000000000,5.5000000000,6.0000000000,6.5000000000,7.0000000000,7.5000000000,8.0000000000,8.5000000000,9.0000000000,9.5000000000,10.0000000000,10.5000000000,11.0000000000,11.5000000000,12.0000000000 -n hours OD600_01.csv OD600_02.csv OD600_03.csv OD600_04.csv OD600_05.csv OD600_06.csv OD600_07.csv OD600_08.csv OD600_09.csv OD600_10.csv OD600_11.csv OD600_12.csv OD600_13.csv OD600_14.csv OD600_15.csv OD600_16.csv OD600_17.csv OD600_18.csv OD600_19.csv OD600_20.csv OD600_21.csv OD600_22.csv OD600_23.csv OD600_24.csv >> data.csv

The csvstack function takes a list of csv files and stacks them, but it also allows you to group each stack. For this I wanted to group by time point so using -g to define the group names I supplied a sequence from 0.5 to 12 in increments of 0.5.

You can generate a sequence with bash which saves typing them out manually:

START=0.5
END=12.5

for ((i=START; i<=END; i+=0.5))
do
 echo "$i,"
done

Using the -n option csvstack will also name the new group column so I called this ‘hours’.

Finally for csvstack it needed the list of csv files, I didn’t want to type this out manually so I used python to grab the list as an array.

from os import listdir
listdir('./')

This just simply returns an array of all files in the current directory. One small annoyance is Transcriptic named the files OD600_1.csv rather than OD600_01.csv which means the files don’t get ordered properly so I had to go through and rename the first 9 files.

Once I got the list with python I just copied and pasted it into the csvstack function and had to remove the quotation marks and commas.

Finally I got my single dataset file which looked like this:

hours,well,abs
5000000000,a1,0.1673733069494896
5000000000,b1,0.16264489187043785
5000000000,c1,0.1599586580539759
5000000000,d1,0.09181026208443736
0000000000,a1,0.1922003154880472
0000000000,b1,0.1840397293890507
0000000000,c1,0.17612867568903176
0000000000,d1,0.09225487472406724
5000000000,a1,0.21441167080055457
5000000000,b1,0.20193852213131347
5000000000,c1,0.18963634938073973
5000000000,d1,0.09274211583155267
0000000000,a1,0.30488315358057455
0000000000,b1,0.2747621314874783
0000000000,c1,0.23442914734536185
0000000000,d1,0.09334329521399287
5000000000,a1,0.36189560828780176

The last step was to plot the growth curves with R and ggplot2 using:

df <- data.frame(read.csv('data.csv'))
qplot(data=df,x=hours, y=abs, color=well)

growth curves

In the plot you can see the 3 replicates from a1, b1 and c1 of the plate and finally the negative bacteria control in well d1. This isn’t the usual growth curve behavior I’ve observed previously. Though firstly the negative bacteria control performed as expected and showed no growth. The 3 sample replicates very quickly reached the peak bacterial density at 3 - 3.5 hours. Typically I have observed peak density to be reached at around 6 hours. Furthermore the peak absorbance for each replicate fell between 0.25 and 0.40, whereas observations are usually made up towards 0.8 to 1.2. Interestingly there is a correlation between the measured absorbance and the well position in the plate, this may be coincidental or perhaps due to the lag times between innoculation of the LB in each well.

A possible explanation for the very short time to peak density is that I might have started from too high an initial concentration of bacteria. After initiating the run I noticed that Transcriptic recommend a growth curve as a sort of ‘hello world’ and performed it nearly exactly the same as me however they add 2µL of DH5alpha instead of the 5µL I did.

I’m not sure about the absorbance at the peak density, it might be to do with the quantity of LB as population stagnation and decline is usually due to nutrient scarcity toxicity build up.

Summary

I am unbelievably excited about Transcriptic and I think it is the future of research. The idea of a student building a biotech company from their laptop has crystallised into a very tangible vision, however there are some questions left. I think to do any original research one needs to be creating bespoke reagents and sending them to Transcriptic to execute these experiments. From what I understand Transcriptic already have this shipping and storage process nailed. However if I do not have access to a lab how do I get a custom buffer made and stored at Transcriptic?

Furthermore where AWS and Heroku (used to) have free tiers enabling very cheap and easy prototyping and iteration, with Transcriptic you are going to be paying for nearly everything and every iteration of a run is going to erode away your disposable income. This is obviously understandable as there are far fewer automated work cells than there are data centers and we’re working in a world of atoms rather than bits.

Perhaps in a world of independent research there will be a resurgence in patronage, or some life science savvy risk taking angels fronting a lot of the cash.

It would be awesome to see Transcriptic start a Discourse forum so everyone using the platform can discuss ideas and help each other. I’m sure they must have enough engaged users to make this worthwhile.

All in all I’m amazed and excited and hope do more and more with the platform over time.

benmiles.xyz Biotech 🦠, startups 🚀, tech 👨‍💻 🦉

Living the dream, robots running biological experiments in silicon valley AKA my first go on Transcriptic

DH5alpha Growth Curve

Set up

Run progress

Data processing and visualisation

Summary

Related Posts

Neuromancer introduction 18 Jan 2025

USA 🇺🇸🦅 04 Nov 2024

The economics of Varda Space Industries manufacturing drugs 28 Jun 2023