Using the Disease ontology (DO) to map the genes involved in a category of disease. My notebook
In the current post, I'll use the disease ontology (DO) to map all the genes involved in a cardiac disease.
Using The BioPortal, I found that my term of interest is DOID:114 ("Heart Disease"). I now need to find all the descendants of this term.
The Disease Ontology is available for download here: http://www.obofoundry.org/cgi-bin/detail.cgi?id=disease_ontology. The following XSLT stylesheet retrieves of all the descendants of a given term using a recursive algorithm:
Usage:
xsltproc --stringparam ID "DOID:114" do.xsl do.owl|\ sort | uniq | cut -f 1 & doids.txt
Result:
$ head doids.txt DOID:0050650 DOID:0060000 DOID:0060036 DOID:0060068 DOID:10234 DOID:10266 DOID:10272 DOID:10273 DOID:10314 DOID:10392
In Annotating the human genome with Disease Ontology, Osborne & al. have mapped the terms of DO to OMIM and to NCBI Gene. The database dump is available at http://projects.bioinformatics.northwestern.edu/do_rif/do_rif.human.txt. We can use the file "doids.txt" and the fgrep command to extract the genes associated to our selected terms.
~$ curl -s "http://projects.bioinformatics.northwestern.edu/do_rif/do_rif.human.txt" | fgrep -w -f doids.txt 100133941 A decrease in CD4+CD25+ T cell numbers in mitral stenosis patients might suggest a role for cellular autoimmunity in a smoldering rheumatic process. 17944116 C0026269 DOID:1754 in mitral stenosis patients 734 10014 Chronic upregulation/activation of CaMKIID, and PKD in heart failure shifts HDAC5 out of the nucleus, derepressing transcription of hypertrophic genes. 18218981 C0018801 DOID:6000 in heart failure 1000 10068 IL-18 levels, which are determined in part by variation in IL18/IL18BP, play a role in coronary heart disease development and postsurgery outcome. 17951325 C0010054 DOID:3363 in coronary heart disease development 756 10068 IL-18 levels, which are determined in part by variation in IL18/IL18BP, play a role in coronary heart disease development and postsurgery outcome. 17951325 C0010068 DOID:3393 in coronary heart disease development 756 100 ADA*2 allele may decrease genetic susceptibility to coronary artery disease. 17287605 C0010054 DOID:3363 to coronary artery disease 1000 (...)The first column contains the NCBI/Gene ID. Let's extract this column and ask the mysql server of the UCSC for the positions of those genes:
$ curl -s "http://projects.bioinformatics.northwestern.edu/do_rif/do_rif.human.txt" |\ fgrep -w -f doids.txt | cut -d ' ' -f 1 | sort | uniq |\ awk '{printf("select distinct R.chrom,R.txStart,R.txEnd,L.product,L.locusLinkId from refLink as L,refGene as R where R.name=L.mrnaAcc and L.locusLinkId=%s;\n",$1);}' | \ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -N chr20 43248162 43280376 adenosine deaminase 100 chrY 21152525 21154705 signal transducer CD24 precursor 100133941 chr17 42154120 42201014 histone deacetylase 5 isoform 1 10014 chr17 42154120 42201014 histone deacetylase 5 isoform 3 10014 chr11 71710108 71713574 interleukin-18-binding protein isoform a precursor 10068 chr11 71709957 71713574 interleukin-18-binding protein isoform a precursor 10068 chr11 71710972 71713574 interleukin-18-binding protein isoform b precursor 10068 chr11 71710662 71713574 interleukin-18-binding protein isoform a precursor 10068 chr11 71709957 71713850 interleukin-18-binding protein isoform d precursor 10068 chr11 71710108 71713965 interleukin-18-binding protein isoform c precursor 10068 chr19 16435650 16438339 Krueppel-like factor 2 10365 chr7 30464142 30518393 nucleotide-binding oligomerization domain-containing protein 1 10392 chr20 35169886 35178226 myosin regulatory light polypeptide 9 isoform a 10398 chr20 35169886 35178226 myosin regulatory light polypeptide 9 isoform b 10398 chr12 48128452 48152889 rap guanine nucleotide exchange factor 3 isoform a 10411 chr12 48128452 48152244 rap guanine nucleotide exchange factor 3 isoform b 10411 chr12 48128452 48152181 rap guanine nucleotide exchange factor 3 isoform b 10411 chr16 56995834 57017756 cholesteryl ester transfer protein precursor 1071 chr1 11104854 11107296 mannan-binding lectin serine protease 2 isoform 2 precursor 10747 chr1 11086579 11107296 mannan-binding lectin serine protease 2 isoform 1 preproprotein 10747
checking; the first gene is ADA adenosine deaminase. It is associated to DOID:3363 (coronary arteriosclerosis) and it is cited in pmid:17287605 "ADA*2 allele of the adenosine deaminase gene may protect against coronary artery disease.".
That's it,
Pierre
No comments:
Post a Comment