eXist: The Open Source Native XML Database : My notebook
In a previous post, I've played with Oracle's BerkeleyDB-XML. Here, I used with eXist-db, an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing.
Download & Install
wget http://downloads.sourceforge.net/project/exist/Stable/1.4/eXist-setup-1.4.0-rev10440.jar
java -jar eXist-setup-1.4.0-rev10440.jar
export EXIST_HOME=PATH_TO_EXIST/eXist
And tha'ts it: it was far more easy than installing (compiling...) BerkeleyDB-XML.java -jar eXist-setup-1.4.0-rev10440.jar
export EXIST_HOME=PATH_TO_EXIST/eXist
Starting the Server
eXist/bin/startup.sh
Using locale: en_US.UTF-8
18 Feb 2010 15:31:32,579 [main] INFO (JettyStart.java [run]:90) - Configuring eXist from EXIST/eXist/conf.xml
18 Feb 2010 15:31:32,580 [main] INFO (JettyStart.java [run]:91) -
18 Feb 2010 15:31:32,580 [main] INFO (JettyStart.java [run]:92) - Running with Java 1.6.0_07 [Sun Microsystems Inc. (Java HotSpot(TM) Server VM) in /usr/local/package/jdk1.6.0_07/jre]
(...)
Using locale: en_US.UTF-8
18 Feb 2010 15:31:32,579 [main] INFO (JettyStart.java [run]:90) - Configuring eXist from EXIST/eXist/conf.xml
18 Feb 2010 15:31:32,580 [main] INFO (JettyStart.java [run]:91) -
18 Feb 2010 15:31:32,580 [main] INFO (JettyStart.java [run]:92) - Running with Java 1.6.0_07 [Sun Microsystems Inc. (Java HotSpot(TM) Server VM) in /usr/local/package/jdk1.6.0_07/jre]
(...)
Inserting the data
First using the web console, I've created a 'collection' named '/db/dbsnp'.
I've then downloaded about 1000 XML documents from dbsnp:
for S in `mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select right(name,length(name)-2) from snp130 limit 1000' -N`
do
wget -O rs${S}.xml "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=${S}&retmode=xml"
done
Those document have been inserted in the XML database:do
wget -O rs${S}.xml "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=${S}&retmode=xml"
done
EXIST/eXist/bin/client.sh -u admin --password=mypassword -s -m 'db/dbsnp' -p rs*.xml
(...)
parsed 7704 bytes in 33ms.
storing document rs10.xml (1 of 1) ...done.
parsing 16306 bytes took 36ms.
parsed 16306 bytes in 36ms.
(...)
parsed 7704 bytes in 33ms.
storing document rs10.xml (1 of 1) ...done.
parsing 16306 bytes took 36ms.
parsed 16306 bytes in 36ms.
Using XQUERY
The following XQuery search for all the SNPs in the database '/db/dbsnp' having a heterozygosity greater than 0.49. For each SNP it prints its name, its sequence and the position on the reference genome.xquery version "1.0";
declare namespace s="http://www.ncbi.nlm.nih.gov/SNP/docsum";
<MyListOfSnp>
{
for $x in collection("/db/dbsnp")/s:ExchangeSet/s:Rs
where data($x/s:Het/@value)>0.49
return <SNP>
<name>rs{data($x/@rsId)}</name>
<sequence>{data($x/s:Sequence/s:Seq5)}[{data($x/s:Sequence/s:Observed)}]{data($x/s:Sequence/s:Seq3)}</sequence>
{
for $as in $x/s:Assembly
where $as/@groupLabel="reference" return
for $comp in $as/s:Component return
for $maploc in $comp/s:MapLoc
return
<map>
<chromosome>chr{data($comp/@chromosome)}</chromosome>
<position>{data($maploc/@physMapInt)}</position>
</map>
}
</SNP>
}
</MyListOfSnp>
Executing the query:declare namespace s="http://www.ncbi.nlm.nih.gov/SNP/docsum";
<MyListOfSnp>
{
for $x in collection("/db/dbsnp")/s:ExchangeSet/s:Rs
where data($x/s:Het/@value)>0.49
return <SNP>
<name>rs{data($x/@rsId)}</name>
<sequence>{data($x/s:Sequence/s:Seq5)}[{data($x/s:Sequence/s:Observed)}]{data($x/s:Sequence/s:Seq3)}</sequence>
{
for $as in $x/s:Assembly
where $as/@groupLabel="reference" return
for $comp in $as/s:Component return
for $maploc in $comp/s:MapLoc
return
<map>
<chromosome>chr{data($comp/@chromosome)}</chromosome>
<position>{data($maploc/@physMapInt)}</position>
</map>
}
</SNP>
}
</MyListOfSnp>
EXIST/eXist/bin/client.sh -u admin --password=mypassword -F input.xquery
Result:<MyListOfSnp>
<SNP>
<name>rs10000300</name>
<sequence>ATCAAATACCCAAGCAAAGATTTACATTCAAATCTGTTTACTGAAGTTCTATTTATAATACAATGCAATGAACATAATAGTATATATTTACACGTAATGTAATAAACACAAATATTCAATGGTATAAAAATGGTCAATAAATCGTGGCATAGCCACAGCTTAGAGTACCTGTTTAATGTTCTCAGCTATTTTAACTTTGCTAAATAATATTTAAAGATATGcggtagtcccccttcatctgaggaggacctgttccaagacccccagtggatgcctgaaacctctgatagtaatgaaccctatatatactgttttttcctatacatatttacatatataatacatacctatgattaagtttaatttataaattaggcacagtaagagattaacaacaacaataataaaatgtaacaattatagcaatactctaataataaagttatgtgagtgtggtctctctctctgtctcaaaatatcatactgtatgcctctatttt[G/T]ggaatacagttgacaacgggtaactgaaaccgagaaaagtgaaactgcagatgggggctgactactgTATATGAAAATTAAACAATCagccaggcatggtggctcacgcctgtaataccagcactttgggaggccgaggcgggaggatcacgaggtcaggagatcgagaccacggtgaaaccccgtctctattaaaaatacaaaaaaaaaattagccgggtacagtggcaggcacctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggcagagcttgcagtgagccgagatcgcgccactgcactccagcctgggcgacagagcaagactctgtctcaaaaaaaaaaaaaaaaaaaaaaaaaaGGAAAAGAAAATTAAACAACCAAACAAAATCAGAGTAAATACACCATGTTAATTCTGGTTATATTTGGATTGTGGGCTTATGGGTAGATTTTGTTACATTTTTCTATAATTTCC</sequence>
<map>
<chromosome>chr4</chromosome>
<position>40161303</position>
</map>
</SNP>
<SNP>
<name>rs10000307</name>
<sequence>GTTCAAAGACTCCTGATTAGAGTGTCCTTTCTATAACCAATCTTGTTCCTTAAAACATCTTGAATGATTTGATCTCAGATCCCCTGAAGGGACTGCTGAGATCATCTGCACCAATCCTAAAAAAAAAAATCTTTCATGCCCAAACCCTTAGCAAAGCTAGTTTCTTGTGGGACTCTTAATCCCTCTTATCCTGCTTGACACAGAGGTGCTCACCTGCTGTGCATCAGAAACACTATGGATACTTCTTGAAAGTGCCTGCAACAGAGATACTGATGCATCTGTTGTGGGGTGGGCCCTAGGTATCAGTAATTTAAAAAGTTTTTTAAAATACG[C/T]CCCAGATAATTCTGATTGCTTGTAAATGGCAAAGGTTGAGAAGCACTGCTGGAAGCTTTTGAGCTCCTGTTGGGTAAGTTCAAGCGACAGGAGAATCTCATAGTGATCATAAAACAGCACTCTGAATTCTTGGAGAAACCCAGACTCATCTTATGTGACTAATTTCCTTAATGTGTACCCCAAAACTATCCTAGCGCGTTCACAGGTACACCAGGTAATGCTATTCTGATTGAGCACCCAAGAGTCTC</sequence>
<map>
<chromosome>chr4</chromosome>
<position>188340930</position>
</map>
</SNP>
<SNP>
<name>rs1000031</name>
<sequence>CTTTGAGGATCTCGATGAAAAATCTGCACCTCTCCCAGAAAAATGCACCTCTGCACAGGTTCACAGATGTCTGCATACAATTTCAGGGTTCTCAGACCCTGAAGGCCACCAAGGGACCCAAGTACATGAGCCTTACACAGCACAACCTAAATCGTCAATGGCAATGTCTCAGGAGTGTAGGACAGTGACTGCCTCTGTAAGACCATCAGCACAGCCATGGCCACACATGTTGTCTGGAGGATCAGGTGGCCTTTTTCTGTGGCTTTTGAGGTTGAGGCTGGGTACCCTTGTGGCTAATGCATAATGCCAGGATGGCCAATAAAGACACCATAAAAATTCCCTGCCGTGTGCCTGACACTGGACAGATTTAATCTCCAGGTCTTCTGGGAACCCCGCAgaggcaggggctgttttctcattttactgatggaaactgaggctcaaggaagtgaaggaatttgtttcaagtcccaggcagtacca[C/T]gaacatgggatttgaaatcacgcaagtctgacACGCAAACCTTGGTTCTTTCCTTTTTCCCTTCTCACAGAGGGTGCTTTTCGCTTCCCGGAAGCTGGCAGGGAGTTCCTCTAAAGCGCAGGTTGGAGTGGTCAGAAGGGAGCGAACTGACAGCACGAGGAAGGCTCAGCGCATGCCAGCTCCACTCACGGGAAATGACTCACTGCAGCCCTGCTGCTCTCGGGCTCCGGGGGACACATCCACATTTCCTGTATCTCGGCTAGAGCCTTGGGCAGTGTGAGCTGGCAGGGCAGATCGCTGAAGGCGGCTAGAGATAGAAAACCACCCAGCTCTGCATCCTGAGACAAAGAAGCCTTTCCCTGGGCTCATATGATAGAGGTACGTTGCctctgggcctcagttttgccatctgtaaaatagggTGAAGGTCAGATTAGATTGGGCATATTCAGTGTG</sequence>
<map>
<chromosome>chr18</chromosome>
<position>44615438</position>
</map>
</SNP>
(...)
</MyListOfSnp>
<SNP>
<name>rs10000300</name>
<sequence>ATCAAATACCCAAGCAAAGATTTACATTCAAATCTGTTTACTGAAGTTCTATTTATAATACAATGCAATGAACATAATAGTATATATTTACACGTAATGTAATAAACACAAATATTCAATGGTATAAAAATGGTCAATAAATCGTGGCATAGCCACAGCTTAGAGTACCTGTTTAATGTTCTCAGCTATTTTAACTTTGCTAAATAATATTTAAAGATATGcggtagtcccccttcatctgaggaggacctgttccaagacccccagtggatgcctgaaacctctgatagtaatgaaccctatatatactgttttttcctatacatatttacatatataatacatacctatgattaagtttaatttataaattaggcacagtaagagattaacaacaacaataataaaatgtaacaattatagcaatactctaataataaagttatgtgagtgtggtctctctctctgtctcaaaatatcatactgtatgcctctatttt[G/T]ggaatacagttgacaacgggtaactgaaaccgagaaaagtgaaactgcagatgggggctgactactgTATATGAAAATTAAACAATCagccaggcatggtggctcacgcctgtaataccagcactttgggaggccgaggcgggaggatcacgaggtcaggagatcgagaccacggtgaaaccccgtctctattaaaaatacaaaaaaaaaattagccgggtacagtggcaggcacctgtagtcccagctactcgggaggctgaggcaggagaatggcgtgaacccgggaggcagagcttgcagtgagccgagatcgcgccactgcactccagcctgggcgacagagcaagactctgtctcaaaaaaaaaaaaaaaaaaaaaaaaaaGGAAAAGAAAATTAAACAACCAAACAAAATCAGAGTAAATACACCATGTTAATTCTGGTTATATTTGGATTGTGGGCTTATGGGTAGATTTTGTTACATTTTTCTATAATTTCC</sequence>
<map>
<chromosome>chr4</chromosome>
<position>40161303</position>
</map>
</SNP>
<SNP>
<name>rs10000307</name>
<sequence>GTTCAAAGACTCCTGATTAGAGTGTCCTTTCTATAACCAATCTTGTTCCTTAAAACATCTTGAATGATTTGATCTCAGATCCCCTGAAGGGACTGCTGAGATCATCTGCACCAATCCTAAAAAAAAAAATCTTTCATGCCCAAACCCTTAGCAAAGCTAGTTTCTTGTGGGACTCTTAATCCCTCTTATCCTGCTTGACACAGAGGTGCTCACCTGCTGTGCATCAGAAACACTATGGATACTTCTTGAAAGTGCCTGCAACAGAGATACTGATGCATCTGTTGTGGGGTGGGCCCTAGGTATCAGTAATTTAAAAAGTTTTTTAAAATACG[C/T]CCCAGATAATTCTGATTGCTTGTAAATGGCAAAGGTTGAGAAGCACTGCTGGAAGCTTTTGAGCTCCTGTTGGGTAAGTTCAAGCGACAGGAGAATCTCATAGTGATCATAAAACAGCACTCTGAATTCTTGGAGAAACCCAGACTCATCTTATGTGACTAATTTCCTTAATGTGTACCCCAAAACTATCCTAGCGCGTTCACAGGTACACCAGGTAATGCTATTCTGATTGAGCACCCAAGAGTCTC</sequence>
<map>
<chromosome>chr4</chromosome>
<position>188340930</position>
</map>
</SNP>
<SNP>
<name>rs1000031</name>
<sequence>CTTTGAGGATCTCGATGAAAAATCTGCACCTCTCCCAGAAAAATGCACCTCTGCACAGGTTCACAGATGTCTGCATACAATTTCAGGGTTCTCAGACCCTGAAGGCCACCAAGGGACCCAAGTACATGAGCCTTACACAGCACAACCTAAATCGTCAATGGCAATGTCTCAGGAGTGTAGGACAGTGACTGCCTCTGTAAGACCATCAGCACAGCCATGGCCACACATGTTGTCTGGAGGATCAGGTGGCCTTTTTCTGTGGCTTTTGAGGTTGAGGCTGGGTACCCTTGTGGCTAATGCATAATGCCAGGATGGCCAATAAAGACACCATAAAAATTCCCTGCCGTGTGCCTGACACTGGACAGATTTAATCTCCAGGTCTTCTGGGAACCCCGCAgaggcaggggctgttttctcattttactgatggaaactgaggctcaaggaagtgaaggaatttgtttcaagtcccaggcagtacca[C/T]gaacatgggatttgaaatcacgcaagtctgacACGCAAACCTTGGTTCTTTCCTTTTTCCCTTCTCACAGAGGGTGCTTTTCGCTTCCCGGAAGCTGGCAGGGAGTTCCTCTAAAGCGCAGGTTGGAGTGGTCAGAAGGGAGCGAACTGACAGCACGAGGAAGGCTCAGCGCATGCCAGCTCCACTCACGGGAAATGACTCACTGCAGCCCTGCTGCTCTCGGGCTCCGGGGGACACATCCACATTTCCTGTATCTCGGCTAGAGCCTTGGGCAGTGTGAGCTGGCAGGGCAGATCGCTGAAGGCGGCTAGAGATAGAAAACCACCCAGCTCTGCATCCTGAGACAAAGAAGCCTTTCCCTGGGCTCATATGATAGAGGTACGTTGCctctgggcctcagttttgccatctgtaaaatagggTGAAGGTCAGATTAGATTGGGCATATTCAGTGTG</sequence>
<map>
<chromosome>chr18</chromosome>
<position>44615438</position>
</map>
</SNP>
(...)
</MyListOfSnp>
That's it
Pierre
1 comment:
hey, what if i'm gonna build a web application using a native-XML database (not relasional database such as oracle, mySQL, etc), and querrying the database using the XPath query. And it'll be integrated to a web service which using a SOAP as its protocol for exchanging message.
Can I still use exist db to build my web application according to the requirements I told you above?
Thx for sharing :)
Post a Comment