18 February 2011

A Data Scraper for Amazonia (expression)

Today, we had a lecture about the "human induced pluripotent stem cells", presented by John De Vos. He introduced Amazonia, a free web atlas that allows an easy query of public human transcriptome data. Although there is no web service (REST/SOAP) to access this data, I was interested in getting some profiles of expression from this database as it is something I've failed to achieve with NCBI/GEO.

I wrote the following java scraper:

  • Line 84: we search for a gene name
  • 88: if there is a http redirection, the gene has been found
  • 96: the HTML page is downloaded
  • 100-112: fix the HTML to create a valid XML document
  • 133: transform the HTML page to a DOM document
  • 135-151: use XPATH to find the images and the labels
  • 189-211; put the data into a java/SWING Dialog


javac AmazoniaRobot.java


java AmazoniaRobot EIF4G1
Et voilà:

That's it !



Mikael said...

Thanks for this - very useful!

Bruno said...

tu es le meilleur ;-)