13 April 2012

Using the Disease ontology (DO) to map the genes involved in a category of disease. My notebook

In the current post, I'll use the disease ontology (DO) to map all the genes involved in a cardiac disease.


Using The BioPortal, I found that my term of interest is DOID:114 ("Heart Disease"). I now need to find all the descendants of this term.


The Disease Ontology is available for download here: http://www.obofoundry.org/cgi-bin/detail.cgi?id=disease_ontology. The following XSLT stylesheet retrieves of all the descendants of a given term using a recursive algorithm:



Usage:

xsltproc  --stringparam ID "DOID:114"   do.xsl  do.owl|\
sort | uniq | cut -f 1 & doids.txt 

Result:
$ head doids.txt

DOID:0050650
DOID:0060000
DOID:0060036
DOID:0060068
DOID:10234
DOID:10266
DOID:10272
DOID:10273
DOID:10314
DOID:10392

In Annotating the human genome with Disease Ontology, Osborne & al. have mapped the terms of DO to OMIM and to NCBI Gene. The database dump is available at http://projects.bioinformatics.northwestern.edu/do_rif/do_rif.human.txt. We can use the file "doids.txt" and the fgrep command to extract the genes associated to our selected terms.

~$ curl -s "http://projects.bioinformatics.northwestern.edu/do_rif/do_rif.human.txt" | fgrep -w -f doids.txt

100133941 A decrease in CD4+CD25+ T cell numbers in mitral stenosis patients might suggest a role for cellular autoimmunity in a smoldering rheumatic process. 17944116 C0026269 DOID:1754 in mitral stenosis patients 734
10014 Chronic upregulation/activation of CaMKIID, and PKD in heart failure shifts HDAC5 out of the nucleus, derepressing transcription of hypertrophic genes. 18218981 C0018801 DOID:6000 in heart failure 1000
10068 IL-18 levels, which are determined in part by variation in IL18/IL18BP, play a role in coronary heart disease development and postsurgery outcome. 17951325 C0010054 DOID:3363 in coronary heart disease development 756
10068 IL-18 levels, which are determined in part by variation in IL18/IL18BP, play a role in coronary heart disease development and postsurgery outcome. 17951325 C0010068 DOID:3393 in coronary heart disease development 756
100 ADA*2 allele may decrease genetic susceptibility to coronary artery disease. 17287605 C0010054 DOID:3363 to coronary artery disease 1000

(...)
The first column contains the NCBI/Gene ID. Let's extract this column and ask the mysql server of the UCSC for the positions of those genes:


$ curl -s "http://projects.bioinformatics.northwestern.edu/do_rif/do_rif.human.txt" |\
fgrep -w -f doids.txt | cut -d '   ' -f 1 | sort | uniq |\
awk '{printf("select distinct R.chrom,R.txStart,R.txEnd,L.product,L.locusLinkId from refLink as L,refGene as R where R.name=L.mrnaAcc and L.locusLinkId=%s;\n",$1);}' | \
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -N

chr20 43248162 43280376 adenosine deaminase 100
chrY 21152525 21154705 signal transducer CD24 precursor 100133941
chr17 42154120 42201014 histone deacetylase 5 isoform 1 10014
chr17 42154120 42201014 histone deacetylase 5 isoform 3 10014
chr11 71710108 71713574 interleukin-18-binding protein isoform a precursor 10068
chr11 71709957 71713574 interleukin-18-binding protein isoform a precursor 10068
chr11 71710972 71713574 interleukin-18-binding protein isoform b precursor 10068
chr11 71710662 71713574 interleukin-18-binding protein isoform a precursor 10068
chr11 71709957 71713850 interleukin-18-binding protein isoform d precursor 10068
chr11 71710108 71713965 interleukin-18-binding protein isoform c precursor 10068
chr19 16435650 16438339 Krueppel-like factor 2 10365
chr7 30464142 30518393 nucleotide-binding oligomerization domain-containing protein 1 10392
chr20 35169886 35178226 myosin regulatory light polypeptide 9 isoform a 10398
chr20 35169886 35178226 myosin regulatory light polypeptide 9 isoform b 10398
chr12 48128452 48152889 rap guanine nucleotide exchange factor 3 isoform a 10411
chr12 48128452 48152244 rap guanine nucleotide exchange factor 3 isoform b 10411
chr12 48128452 48152181 rap guanine nucleotide exchange factor 3 isoform b 10411
chr16 56995834 57017756 cholesteryl ester transfer protein precursor 1071
chr1 11104854 11107296 mannan-binding lectin serine protease 2 isoform 2 precursor 10747
chr1 11086579 11107296 mannan-binding lectin serine protease 2 isoform 1 preproprotein 10747


checking; the first gene is ADA adenosine deaminase. It is associated to DOID:3363 (coronary arteriosclerosis) and it is cited in pmid:17287605 "ADA*2 allele of the adenosine deaminase gene may protect against coronary artery disease.".


That's it,

Pierre


No comments: