Today, we had a lecture about the "human induced pluripotent stem cells", presented by John De Vos. He introduced Amazonia, a free web atlas that allows an easy query of public human transcriptome data. Although there is no web service (REST/SOAP) to access this data, I was interested in getting some profiles of expression from this database as it is something I've failed to achieve with NCBI/GEO.
I wrote the following java scraper:
- Line 84: we search for a gene name
- 88: if there is a http redirection, the gene has been found
- 96: the HTML page is downloaded
- 100-112: fix the HTML to create a valid XML document
- 133: transform the HTML page to a DOM document
- 135-151: use XPATH to find the images and the labels
- 189-211; put the data into a java/SWING Dialog
That's it !