01 September 2009

First steps with BerkeleyDB-XML. My notebook.

Berkeley DB XML is an embeddable XML database engine that provides support for XQuery access to documents stored in containers and indexed based on their content. Oracle Berkeley DB XML is built on top of Oracle Berkeley DB. Berkeley DB XML is available at : http://www.oracle.com/database/berkeley-db/xml/index.html. The distribution comes with a shell command.

pierre@linux-zfgk:.../dbxml-2.4.16> ./install/bin/dbxml

Creating a DataStore

dbxml> createContainer dbsnp.dbxml
Creating node storage container

Creating a set of XML documents describing some SNPs

dbxml> putDocument snp1 '<snp id="25">
<name>rs25</name>
<class>snp</class>
<het>0.5</het>
<observed>A/G</observed>
<mapping>
<location build="36_3" label="CRA_TCAGchr7v2" chrom="7" position="11637562"/>
<location build="36_3" label="Celera" chrom="7" position="11558958"/>
<location build="36_3" label="HuRef" chrom="7" position="11442496"/>
<location build="36_3" label="reference" chrom="7" position="11550666"/>
</mapping>
</snp>' s
Document added, name = snp1

dbxml> putDocument snp2 '<snp id="26">
<name>rs26</name>
<class>mixed</class>
<het>0</het>
<observed>-/A/G</observed>
<mapping>
<location build="36_3" label="CRA_TCAGchr7v2" chrom="7" position="11636891"/>
<location build="36_3" label="Celera" chrom="7" position="11558287"/>
<location build="36_3" label="HuRef" chrom="7" position="11441825"/>
<location build="36_3" label="reference" chrom="7" position="11549995"/>
</mapping>
</snp>' s
Document added, name = snp2

dbxml> putDocument snp3 '<snp id="27">
<name>rs27</name>
<class>snp</class>
<het>0.44</het>
<observed>C/G</observed>
<mapping>
<location build="36_3" label="CRA_TCAGchr7v2" chrom="7" position="11636645"/>
<location build="36_3" label="Celera" chrom="7" position="11558041"/>
<location build="36_3" label="HuRef" chrom="7" position="11441579"/>
<location build="36_3" label="reference" chrom="7" position="11549749"/>
</mapping>
</snp>' s
Document added, name = snp3


dbxml> putDocument snp4 '<snp id="300">
<name>rs300</name>
<class>snp</class>
<het>0.01</het>
<observed>A/G</observed>
<mapping>
<location build="36_3" label="Celera" chrom="8" position="18779978"/>
<location build="36_3" label="HuRef" chrom="8" position="18357119"/>
<location build="36_3" label="reference" chrom="8" position="19861166"/>
</mapping>
</snp>'
Document added, name = snp4

dbxml> putDocument snp5 '<snp id="600">
<name>rs600</name>
<class>snp</class>
<het>0.27</het>
<observed>C/G</observed>
<mapping>
<location build="36_3" label="Celera" chrom="X" position="148984992"/>
<location build="36_3" label="HuRef" chrom="X" position="137590170"/>
<location build="36_3" label="reference" chrom="X" position="148444179"/>
<location build="36_3" label="reference" chrom="X" position="148843642"/>
</mapping>
</snp>' s
Document added, name = snp5

dbxml> putDocument snp6 '<snp id="800">
<name>rs800</name>
<class>snp</class>
<het/>
<observed>C/G/T</observed>
<mapping>
<location build="36_3" label="Celera" chrom="22" position="18365004"/>
<location build="36_3" label="HuRef" chrom="22" position="17518747"/>
<location build="36_3" label="reference" chrom="22" position="32892297"/>
</mapping>
</snp>' s
Document added, name = snp6

Getting help

dbxml> help

Command Summary
---------------

# - Comment. Does nothing
abort - Aborts the current transaction
addAlias - Add an alias to the default container
addIndex - Add an index to the default container
append - Append to nodes specified in the query expression
commit - Commits the current transaction, and starts a new one
compactContainer - Compact a container to shrink it's size
contextQuery - Execute query expression using the last results as the context item
cquery - Execute an expression in the context of the default container
createContainer - Creates a new container, which becomes the default container
debug - Debug command -- internal use only
delIndex - Delete an index from the default container
echo - Echo to output
getDocuments - Gets document(s) by name from default container
getMetaData - Get a metadata item from the named document
help - Print help information. Use 'help commandName' for extended help
info - Get info on default container
insertAfter - Insert new content after nodes selected by the query expression
insertBefore - Insert new content before nodes selected by the query expression
listIndexes - List all indexes in the default container
lookupEdgeIndex - Performs an edge index lookup in the default container
lookupIndex - Performs an index lookup in the default container
lookupStats - Look up index statistics on the default container
openContainer - Opens a container, and uses it as the default container
preload - Pre-loads (opens) a container
prepare - Prepare the given query expression as the default pre-parsed query
print - Prints most recent results, optionally to a file
putDocument - Insert a document into the default container
query - Execute the given query expression, or the default pre-parsed query
queryPlan - Prints the query plan for the specified query expression
quit - Exit the program
reindexContainer - Reindex a container, optionally changing index type
removeAlias - Remove an alias from the default container
removeContainer - Removes a container
removeDocument - Remove a document from the default container
removeNodes - Remove content from documents specified by the query expression
renameNodes - Rename nodes specified by the query expression
run - Runs the given file as a script
setBaseUri - Set/get the base uri in the default context
setIgnore - Tell the shell to ignore script errors
setLazy - Sets lazy evaluation on or off in the default context
setMetaData - Set a metadata item on the named document
setNamespace - Create a prefix->namespace binding in the default context
setProjection - Enables or disables the use of the document projection optimization
setQueryTimeout - Set a query timeout in seconds in the default context
setReturnType - Sets the return type on the default context
setTypedVariable - Set a variable to the specified type in the default context
setVariable - Set a variable in the default context
setVerbose - Set the verbosity of this shell
sync - Sync current container to disk
time - Wrap a command in a wall-clock timer
transaction - Create a transaction for all subsequent operations to use
updateNodes - Update node content based on query expression and new content
upgradeContainer - Upgrade a container to the current container format

Printing the names of all the SNP

dbxml> query 'collection("dbsnp.dbxml")/snp/name/string()'
6 objects returned for eager expression 'collection("dbsnp.dbxml")/snp/name/string()'


dbxml> print
rs25
rs26
rs27
rs300
rs600
rs800

Finding the observed bases for the SNPs having het>0.3

dbxml> query 'collection("dbsnp.dbxml")/snp[number(het) > 0.3 ]/observed/string()'
2 objects returned for eager expression 'collection("dbsnp.dbxml")/snp[number(het) > 0.3 ]/observed/string()'


dbxml> print
A/G
C/G

Printing a HTML table of all the SNPs on chrom7

dbxml> query '<html><body><table><tr><th>Name</th><th>Chrom</th><th>Position</th></tr>
{ for $location in collection("dbsnp.dbxml")/snp/mapping/location[@chrom="7" and @label="reference"]
return
<tr><th>{$location/../../name/string()}</th><td>7</td><td>{$location/@position/string()}</td></tr>
}
</table></body></html>'
1 objects returned for eager expression '<html><body><table><tr><th>Name</th><th>Chrom</th><th>Position</th></tr> { for $location in collection("dbsnp.dbxml")/snp/mapping/location[@chrom="7" and @label="reference"]
return
<tr><th>{$location/../../name/string()}</th><td>7</td><td>{$location/@position/string()}</td></tr>}</table></body></html>'


dbxml> print
<html><body><table><tr><th>Name</th><th>Chrom</th><th>Position</th></tr><tr><th>rs25</th><td>7</td><td>11550666</td></tr><tr><th>rs26</th><td>7</td><td>11549995</td></tr><tr><th>rs27</th><td>7</td><td>11549749</td></tr></table></body></html>

Result viewed in a browser:
NameChromPosition
rs25711550666
rs26711549995
rs27711549749


That's it
Pierre

No comments: