
Today, we had a lecture about the "human induced pluripotent stem cells", presented by John De Vos. He introduced Amazonia, a free web atlas that allows an easy query of public human transcriptome data. Although there is no web service (REST/SOAP) to access this data, I was interested in getting some profiles of expression from this database as it is something I've failed to achieve with NCBI/GEO.
I wrote the following java scraper:
- Line 84: we search for a gene name
 - 88: if there is a http redirection, the gene has been found
 - 96: the HTML page is downloaded
 - 100-112: fix the HTML to create a valid XML document
 - 133: transform the HTML page to a DOM document
 - 135-151: use XPATH to find the images and the labels
 - 189-211; put the data into a java/SWING Dialog
 
Compilation
javac AmazoniaRobot.java
Execution
java AmazoniaRobot EIF4G1
Et voilĂ :That's it !
Pierre

Thanks for this - very useful!
ReplyDeletetu es le meilleur ;-)
ReplyDelete