Showing posts with label geneontology. Show all posts
Showing posts with label geneontology. Show all posts

12 July 2013

Inside the Variation Toolkit: Gene Ontology for VCF, GUI for VCF

A quick note about three java-based tools for VCF files I wrote today.

VcfViewGui

VcfViewGui : a Simple java-Swing-based VCF viewer.


VCFGeneOntology

vcfgo reads a VCF annotated with VEP or SNPEFF, loads the data from GeneOntology and GOA and adds a new field in the INFO column for the GO terms for each position.
Example:
$ java -jar dist/vcfgo.jar I="https://raw.github.com/arq5x/gemini/master/test/tes.snpeff.vcf" |\
    grep -v -E '^##' | head -n 3

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1094PC0005  1094PC0009  1094PC0012  1094PC0013
chr1    30860   .   G   C   33.46   .   AC=2;AF=0.053;AN=38;BaseQRankSum=2.327;DP=49;Dels=0.00;EFF=DOWNSTREAM(MODIFIER||||85|FAM138A|protein_coding|CODING|ENST00000417324|),DOWNSTREAM(MODIFIER|||||FAM138A|processed_transcript|CODING|ENST00000461467|),DOWNSTREAM(MODIFIER|||||MIR1302-10|miRNA|NON_CODING|ENST00000408384|),INTRON(MODIFIER|||||MIR1302-10|antisense|NON_CODING|ENST00000469289|),INTRON(MODIFIER|||||MIR1302-10|antisense|NON_CODING|ENST00000473358|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000423562|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000430492|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000438504|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000488147|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000538476|);FS=3.128;HRun=0;HaplotypeScore=0.6718;InbreedingCoeff=0.1005;MQ=36.55;MQ0=0;MQRankSum=0.217;QD=16.73;ReadPosRankSum=2.017 GT:AD:DP:GQ:PL  0/0:7,0:7:15.04:0,15,177    0/0:2,0:2:3.01:0,3,39   0/0:6,0:6:12.02:0,12,143    0/0:4,0:4:9.03:0,9,119
chr1    69270   .   A   G   2694.18 .   AC=40;AF=1.000;AN=40;DP=83;Dels=0.00;EFF=SYNONYMOUS_CODING(LOW|SILENT|tcA/tcG|S60|305|OR4F5|protein_coding|CODING|ENST00000335137|exon_1_69091_70008);FS=0.000;GOA=OR4F5|GO:0004984&GO:0005886&GO:0004930&GO:0016021;HRun=0;HaplotypeScore=0.0000;InbreedingCoeff=-0.0598;MQ=31.06;MQ0=0;QD=32.86   GT:AD:DP:GQ:PL  ./. ./. 1/1:0,3:3:9.03:106,9,0  1/1:0,6:6:18.05:203,18,0

VCFFilterGeneOntology

vcffiltergo reads a VCF annotated with VEP or SNPEFF, loads the data from GeneOntology and GOA and adds a filter in the FILTER column if a gene at the current genomic location is a descendant of a given GO term.
Example:
$  java -jar dist/vcffiltergo.jar I="https://raw.github.com/arq5x/gemini/master/test/test1.snpeff.vcf"  \
    CHILD_OF=GO:0005886 FILTER=MEMBRANE  |\
    grep -v "^##"   | head -n 3

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1094PC0005  1094PC0009  1094PC0012  1094PC0013
chr1    30860   .   G   C   33.46   PASS    AC=2;AF=0.053;AN=38;BaseQRankSum=2.327;DP=49;Dels=0.00;EFF=DOWNSTREAM(MODIFIER||||85|FAM138A|protein_coding|CODING|ENST00000417324|),DOWNSTREAM(MODIFIER|||||FAM138A|processed_transcript|CODING|ENST00000461467|),DOWNSTREAM(MODIFIER|||||MIR1302-10|miRNA|NON_CODING|ENST00000408384|),INTRON(MODIFIER|||||MIR1302-10|antisense|NON_CODING|ENST00000469289|),INTRON(MODIFIER|||||MIR1302-10|antisense|NON_CODING|ENST00000473358|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000423562|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000430492|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000438504|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000488147|),UPSTREAM(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000538476|);FS=3.128;HRun=0;HaplotypeScore=0.6718;InbreedingCoeff=0.1005;MQ=36.55;MQ0=0;MQRankSum=0.217;QD=16.73;ReadPosRankSum=2.017 GT:AD:DP:GQ:PL  0/0:7,0:7:15.04:0,15,177    0/0:2,0:2:3.01:0,3,39   0/0:6,0:6:12.02:0,12,143    0/0:4,0:4:9.03:0,9,119
chr1    69270   .   A   G   2694.18 MEMBRANE    AC=40;AF=1.000;AN=40;DP=83;Dels=0.00;EFF=SYNONYMOUS_CODING(LOW|SILENT|tcA/tcG|S60|305|OR4F5|protein_coding|CODING|ENST00000335137|exon_1_69091_70008);FS=0.000;HRun=0;HaplotypeScore=0.0000;InbreedingCoeff=-0.0598;MQ=31.06;MQ0=0;QD=32.86 GT:AD:DP:GQ:PL  ./. ./. 1/1:0,3:3:9.03:106,9,0  1/1:0,6:6:18.05:203,18,0



That's it,

Pierre

06 April 2012

Indexing the content of Gene Ontology with apache SOLR

Via Wikipedia:"Solr (http://lucene.apache.org/solr/) is an open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable." In the this post, I'll show how I've used SOLR to index the content of GeneOntology.

Download and install SOLR

Download from http://mirrors.ircam.fr/pub/apache/lucene/solr/3.5.0/apache-solr-3.5.0.tgz.
tar xvfz apache-solr-3.5.0.tgz
rm apache-solr-3.5.0.tgz

Configure schema.xml

We need to tell SOLR about the which fields of GO will be indexed, what are their type, how they should be tokenized and parsed. This information is defined in the schema.xml. The following components will be indexed: accession, name, synonym and definition. Edit apache-solr-3.5.0/example/solr/conf/schema.xml and add the following <fields>:

<field name="go_name" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="go_synonym" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="go_definition" type="text_general" indexed="true" stored="true" multiValued="false"/>

Start the SOLR server

In this example, the SOLR server is started using the simple Jetty server provided in the distribution:

$ cd apache-solr-3.5.0/example/example
$ java -jar start.jar

(...)

Indexing Gene Ontology

Go is downloaded as RDF/XML from http://archive.geneontology.org/latest-termdb/go_daily-termdb.rdf-xml.gz
 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE go:go PUBLIC "-//Gene Ontology//Custom XML/RDF Version 2.0//EN" "http://www.geneontology.org/dtd/go.dtd">

<go:go xmlns:go="http://www.geneontology.org/dtds/go.dtd#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:RDF>
        <go:term rdf:about="http://www.geneontology.org/go#GO:0000001">
            <go:accession>GO:0000001</go:accession>
            <go:name>mitochondrion inheritance</go:name>
            <go:synonym>mitochondrial inheritance</go:synonym>
            <go:definition>The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.</go:definition>
            <go:is_a rdf:resource="http://www.geneontology.org/go#GO:0048308" />
            <go:is_a rdf:resource="http://www.geneontology.org/go#GO:0048311" />
        </go:term>
        <go:term rdf:about="http://www.geneontology.org/go#GO:0000002">
            <go:accession>GO:0000002</go:accession>
            <go:name>mitochondrial genome maintenance</go:name>
            <go:definition>The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.</go:definition>
            <go:is_a rdf:resource="http://www.geneontology.org/go#GO:0007005" />
            <go:dbxref rdf:parseType="Resource">
                <go:database_symbol>InterPro</go:database_symbol>
(...)
 
We now need to transform this XML file to another XML file that can be indexed by the SOLR server.  

"You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index."

The following XSLT stylesheet is used to transform the RDF/XML for GO:


$ xsltproc --novalid go2solr.xsl go_daily-termdb.rdf-xml.gz > add.xml
$ cat add.xml

Before indexing the current disk usage under apache-solr-3.5.0 is 136Mo. We can now use the java utiliy post.jar to index GeneOntology.

 $ cd  ~/package/apache-solr-3.5.0/example/exampledocs
 $ java -jar post.jar  add.xml

SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file jeter.xml
SimplePostTool: COMMITting Solr index changes..

After indexing, the disk usage under apache-solr-3.5.0 is 153Mo.

Querying

Search for the GO terms having go:definition containing "cancer" a go:name containing "genome" but discard those having go:definition containing "metabolism".
 curl "http://localhost:8983/solr/select/?q=go_definition%3Acancer+go_name%3Agenome+-go_definition%3Ametabolism&version=2.2&start=0&rows=10&indent=on"
Same query, but return the result as a JSON structure:
 curl "http://localhost:8983/solr/select/?q=go_definition%3Acancer+go_name%3Agenome+-go_definition%3Ametabolism&version=2.2&start=0&rows=10&indent=on&wt=json"
That's it, Pierre

21 September 2010

Trees in Mongodb, my notebook with Gene Ontology

In the current post I've loaded the Gene Ontology into MongoDB and played with the tree structure of the database:

Loading GeneOntology into MongoDB

First, download GO as RDF at http://archive.geneontology.org/latest-termdb/go_daily-termdb.rdf-xml.gz and transform it with my XSLT stylesheet go2mongo.xsl (available here):
<?xml version='1.0' encoding="UTF-8" ?>
<xsl:stylesheet
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
xmlns:go="http://www.geneontology.org/dtds/go.dtd#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
version='1.0'
>
<xsl:output method="text"/>

<xsl:param name="colName">go</xsl:param>

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="go:go">
<xsl:apply-templates select="rdf:RDF"/>
</xsl:template>

<xsl:template match="rdf:RDF">

db.<xsl:value-of select="$colName"/>.drop();

<xsl:apply-templates select="go:term"/>


</xsl:template>

<xsl:template match="go:term">
<xsl:text>term={_id:</xsl:text><xsl:apply-templates select="go:accession" mode="text"/>
<xsl:text>,name:</xsl:text><xsl:apply-templates select="go:name" mode="text"/>
<xsl:if test="go:synonym">
<xsl:text>,synonyms:[</xsl:text>
<xsl:for-each select="go:synonym">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="." mode="text"/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:definition">
<xsl:text>,definition:</xsl:text>
<xsl:apply-templates select="go:definition" mode="text"/>
</xsl:if>

<xsl:if test="go:comment">
<xsl:text>,comments:[</xsl:text>
<xsl:for-each select="go:comment">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="." mode="text"/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:part_of">
<xsl:text>,part_of:[</xsl:text>
<xsl:for-each select="go:part_of">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="@rdf:resource"/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:is_a">
<xsl:text>,is_a:[</xsl:text>
<xsl:for-each select="go:is_a">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="@rdf:resource"/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:negatively_regulates">
<xsl:text>,negatively_regulates:[</xsl:text>
<xsl:for-each select="go:negatively_regulates">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="@rdf:resource"/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:positively_regulates">
<xsl:text>,positively_regulates:[</xsl:text>
<xsl:for-each select="go:positively_regulates">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="@rdf:resource"/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:regulates">
<xsl:text>,regulates:[</xsl:text>
<xsl:for-each select="go:regulates">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="@rdf:resource"/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:dbxref">
<xsl:text>,dbxrefs:[</xsl:text>
<xsl:for-each select="go:dbxref">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="."/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:association">
<xsl:text>,associations:[</xsl:text>
<xsl:for-each select="go:association">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:text>{evidences:[</xsl:text>
<xsl:for-each select="go:evidence">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="." mode="text"/>
</xsl:for-each>
<xsl:text>],gene_product:{name:</xsl:text>
<xsl:apply-templates select="go:gene_product/go:name" mode="text"/>
<xsl:text>,dbxref:</xsl:text>
<xsl:apply-templates select="go:gene_product/go:dbxref" />
<xsl:text>}}</xsl:text>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>

<xsl:if test="go:is_obsolete">
<xsl:text>,is_obsolete:[</xsl:text>
<xsl:for-each select="go:is_obsolete">
<xsl:if test="position()!=1"><xsl:text>,</xsl:text></xsl:if>
<xsl:apply-templates select="@rdf:resource"/>
</xsl:for-each>
<xsl:text>]</xsl:text>
</xsl:if>
<xsl:text>};
db.</xsl:text>
<xsl:value-of select="$colName"/>
<xsl:text>.save(term);
</xsl:text>
</xsl:template>

<xsl:template match="go:dbxref">
<xsl:text>{database_symbol:</xsl:text>
<xsl:apply-templates select="go:database_symbol" mode="text"/>
<xsl:text>,reference:</xsl:text>
<xsl:apply-templates select="go:reference" mode="text"/>
<xsl:text>}</xsl:text>
</xsl:template>

<xsl:template match="*" mode="text">
<xsl:text>&quot;</xsl:text>
<xsl:call-template name="escape">
<xsl:with-param name="s" select="."/>
</xsl:call-template>
<xsl:text>&quot;</xsl:text>
</xsl:template>

<xsl:template match="@rdf:resource">
<xsl:text>{&apos;$ref&apos;:&apos;</xsl:text>
<xsl:value-of select="$colName"/>
<xsl:text>&apos;,&apos;$id&apos;:&apos;</xsl:text>
<xsl:value-of select="substring-after(.,'#')"/>
<xsl:text>&apos;}</xsl:text>
</xsl:template>


<xsl:template name="escape">
<xsl:param name="s"/>
<xsl:choose>
<xsl:when test="contains($s,'&quot;')">
<xsl:value-of select="substring-before($s,'&quot;')"/>
<xsl:text>\&quot;</xsl:text>
<xsl:call-template name="escape">
<xsl:with-param name="s" select="substring-after($s,'&quot;')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$s"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>



</xsl:stylesheet>


unzip and transform 'go_daily-termdb.rdf-xml' with the stylesheet to generate the javascript:
xsltproc go2mongo.xsl go_daily-termdb.rdf-xml > input.js
The file input.js looks like this:
term={
_id:"GO:0000001",
name:"mitochondrion inheritance",
synonyms:["mitochondrial inheritance"],
definition:"The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.",
is_a:[
{'$ref':'go','$id':'GO:0048308'},
{'$ref':'go','$id':'GO:0048311'}
]
};
db.go.save(term);
term={_id:"GO:0000002",name:"mitochondrial genome maintenance",definition:"The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.",is_a:[{'$ref':'go','$id':'GO:0007005'}],dbxrefs:[{database_symbol:"InterPro",reference:"IPR009446"},{database_symbol:"Pfam",reference:"PF06420"}]};
db.go.save(term);
term={_id:"GO:0000003",name:"reproduction",synonyms:["GO:0019952","GO:0050876","reproductive physiological process"],definition:"The production by an organism of new individuals that contain some portion of their genetic material inherited from that organism.",is_a:[{'$ref':'go','$id':'GO:0008150'}],dbxrefs:[{database_symbol:"Wikipedia",reference:"Reproduction"}]};
db.go.save(term);
term={_id:"GO:0000005",name:"ribosomal chaperone activity",definition:"OBSOLETE. Assists in the correct assembly of ribosomes or ribosomal subunits in vivo, but is not a component of the assembled ribosome when performing its normal biological function.",comments:["This term was made obsolete because it refers to a class of gene products and a biological process rather than a molecular function."],is_a:[{'$ref':'go','$id':'obsolete_molecular_function'}]};
db.go.save(term);
term={_id:"GO:0042254",name:"ribosome biogenesis",synonyms:["GO:0007046","ribosomal chaperone activity","ribosome biogenesis and assembly"],definition:"The process of the formation of the constituents of the ribosome subunits, their assembly, and their transport to the sites of protein synthesis.",is_a:[{'$ref':'go','$id':'GO:0022613'}],dbxrefs:[{database_symbol:"InterPro",reference:"IPR001790"},{database_symbol:"InterPro",reference:"IPR004037"},{database_symbol:"InterPro",reference:"IPR007023"},{database_symbol:"InterPro",reference:"IPR012948"},{database_symbol:"SP_KW",reference:"KW-0690"},{database_symbol:"HAMAP",reference:"MF_00554"},{database_symbol:"HAMAP",reference:"MF_00699"},{database_symbol:"HAMAP",reference:"MF_00803"},{database_symbol:"HAMAP",reference:"MF_01852"},{database_symbol:"Pfam",reference:"PF00466"},{database_symbol:"Pfam",reference:"PF04939"},{database_symbol:"Pfam",reference:"PF08142"},{database_symbol:"PROSITE",reference:"PS01082"},{database_symbol:"Wikipedia",reference:"Ribosome_biogenesis"},{database_symbol:"SMART",reference:"SM00785"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR00436"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR01575"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR02729"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR03594"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR03596"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR03597"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR03598"}]};
db.go.save(term);
term={_id:"GO:0044183",name:"protein binding involved in protein folding",synonyms:["chaperone activity"],definition:"Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules) that contributes to the process of protein folding.",is_a:[{'$ref':'go','$id':'GO:0005515'}]};
db.go.save(term);
term={_id:"GO:0051082",name:"unfolded protein binding",synonyms:["binding unfolded ER proteins","chaperone activity","fimbrium-specific chaperone activity","glycoprotein-specific chaperone activity","histone-specific chaperone activity","ribosomal chaperone activity","tubulin-specific chaperone activity"],definition:"Interacting selectively and non-covalently with an unfolded protein.",is_a:[{'$ref':'go','$id':'GO:0005515'}],dbxrefs:[{database_symbol:"InterPro",reference:"IPR000397"},{database_symbol:"InterPro",reference:"IPR001305"},{database_symbol:"InterPro",reference:"IPR001404"},{database_symbol:"InterPro",reference:"IPR002194"},{database_symbol:"InterPro",reference:"IPR002777"},{database_symbol:"InterPro",reference:"IPR002939"},{database_symbol:"InterPro",reference:"IPR003095"},{database_symbol:"InterPro",reference:"IPR003708"},{database_symbol:"InterPro",reference:"IPR004127"},{database_symbol:"InterPro",reference:"IPR004226"},{database_symbol:"InterPro",reference:"IPR004487"},{database_symbol:"InterPro",reference:"IPR004961"},{database_symbol:"InterPro",reference:"IPR008971"},{database_symbol:"InterPro",reference:"IPR009033"},{database_symbol:"InterPro",reference:"IPR009169"},{database_symbol:"InterPro",reference:"IPR010236"},{database_symbol:"InterPro",reference:"IPR011599"},{database_symbol:"InterPro",reference:"IPR012713"},{database_symbol:"InterPro",reference:"IPR012714"},{database_symbol:"InterPro",reference:"IPR012715"},{database_symbol:"InterPro",reference:"IPR012716"},{database_symbol:"InterPro",reference:"IPR012717"},{database_symbol:"InterPro",reference:"IPR012718"},{database_symbol:"InterPro",reference:"IPR012719"},{database_symbol:"InterPro",reference:"IPR012720"},{database_symbol:"InterPro",reference:"IPR012721"},{database_symbol:"InterPro",reference:"IPR012722"},{database_symbol:"InterPro",reference:"IPR012724"},{database_symbol:"InterPro",reference:"IPR012725"},{database_symbol:"InterPro",reference:"IPR016153"},{database_symbol:"InterPro",reference:"IPR016154"},{database_symbol:"InterPro",reference:"IPR019805"},{database_symbol:"HAMAP",reference:"MF_00117"},{database_symbol:"PROSITE",reference:"MF_00117"},{database_symbol:"HAMAP",reference:"MF_00175"},{database_symbol:"PROSITE",reference:"MF_00175"},{database_symbol:"HAMAP",reference:"MF_00307"},{database_symbol:"PROSITE",reference:"MF_00307"},{database_symbol:"HAMAP",reference:"MF_00308"},{database_symbol:"PROSITE",reference:"MF_00308"},{database_symbol:"PROSITE",reference:"MF_00332"},{database_symbol:"HAMAP",reference:"MF_00505"},{database_symbol:"PROSITE",reference:"MF_00505"},{database_symbol:"HAMAP",reference:"MF_00600"},{database_symbol:"PROSITE",reference:"MF_00679"},{database_symbol:"HAMAP",reference:"MF_00790"},{database_symbol:"PROSITE",reference:"MF_00821"},{database_symbol:"HAMAP",reference:"MF_00822"},{database_symbol:"HAMAP",reference:"MF_01046"},{database_symbol:"HAMAP",reference:"MF_01152"},{database_symbol:"PROSITE",reference:"MF_01152"},{database_symbol:"HAMAP",reference:"MF_01183"},{database_symbol:"ProDom",reference:"PD010430"},{database_symbol:"Pfam",reference:"PF00684"},{database_symbol:"Pfam",reference:"PF01430"},{database_symbol:"Pfam",reference:"PF01556"},{database_symbol:"Pfam",reference:"PF01920"},{database_symbol:"Pfam",reference:"PF02556"},{database_symbol:"Pfam",reference:"PF02970"},{database_symbol:"Pfam",reference:"PF02996"},{database_symbol:"Pfam",reference:"PF03280"},{database_symbol:"PIRSF",reference:"PIRSF002356"},{database_symbol:"PIRSF",reference:"PIRSF002583"},{database_symbol:"PIRSF",reference:"PIRSF005261"},{database_symbol:"PRINTS",reference:"PR00625"},{database_symbol:"PRINTS",reference:"PR01594"},{database_symbol:"PROSITE",reference:"PS00298"},{database_symbol:"PROSITE",reference:"PS00750"},{database_symbol:"PROSITE",reference:"PS00751"},{database_symbol:"PROSITE",reference:"PS00995"},{database_symbol:"PROSITE",reference:"PS51188"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR00074"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR00115"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR00382"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR00809"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR02350"},{database_symbol:"JCVI_TIGRFAMS",reference:"TIGR03142"}]};
db.go.save(term);
term={_id:"GO:0000006",name:"high affinity zinc uptake transmembrane transporter activity",definition:"Catalysis of the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: Zn2+(out) = Zn2+(in), probably powered by proton motive force. In high affinity transport the transporter is able to bind the solute even if it is only present at very low concentrations.",is_a:[{'$ref':'go','$id':'GO:0005385'}]};
db.go.save(term);
(...)
Here the notation {'$ref':'go','$id':'GO:0048308'} is a special object interpreted by mongo as a "Database Reference", a kind of forein-key/link/pointer to another document with a special method named 'fetch' retrieving the linked document.

Load 'input.js' into mongo
mongo mygodatabase input.js

Playing with the GeneOntology Tree

I'm going to look if a go-term is a descendant of one another. First, let's define two useful javascript recursive functions looking for the parent(s) of a given node threw the property is_a.
var goNodeIsA= function (childNode, parentId) {
if (childNode == null) {
return false;
}
if (childNode._id == parentId) {
return true;
}
if (!childNode.is_a) {
return false;
}
for (var i = 0; i < childNode.is_a.length; ++i) {
if (goNodeIsA(childNode.is_a[i].fetch(), parentId))
{
return true;
}
}
return false;
}

var goIsA=function (childId, parentId)
{
return goNodeIsA(db.go.findOne({_id:childId}), parentId);
}

Now, let's find if GO:0003723 (RNA binding) is a descendant of GO:0005488 (binding) ?
> goIsA("GO:0003723","GO:0005488");
true

And is GO:0003723 (RNA binding) is a descendant of GO:0050355 (triphosphatase activity) ?
> goIsA("GO:0003723","GO:0050355");
false

Loop over all the GO terms and find the descendants of GO:0050355 (triphosphatase activity):
> db.go.find({},{name:1,is_a:1}).forEach(function(term) { if(goIsA(term._id,'GO:0005488')) printjson(term); })

(...)
{
"_id" : "GO:0080084",
"name" : "5S rDNA binding",
"is_a" : [
{
"$ref" : "go",
"$id" : "GO:0000182"
}
]
}
{
"_id" : "GO:0080087",
"name" : "callose binding",
"is_a" : [
{
"$ref" : "go",
"$id" : "GO:0030247"
}
]
}
{
"_id" : "GO:0080115",
"name" : "myosin XI tail binding",
"is_a" : [
{
"$ref" : "go",
"$id" : "GO:0032029"
}
]
}
{
"_id" : "GO:0090079",
"name" : "translation regulator activity, nucleic acid binding",
"is_a" : [
{
"$ref" : "go",
"$id" : "GO:0003676"
},
{
"$ref" : "go",
"$id" : "GO:0045182"
}
]
}
(...)




That's it

Pierre

29 May 2010

From mRNA to GO

In this post , I've just copied the solutions I gave on http://biostar.stackexchange.com/ for the following problem:"How do I do simple GO term lookup given a gene (or mRNA) identifier?".
The best answer was given by Neil who suggested to use the BiomartR package.

On my side, I submitted two solutions:

1st solution: use a recursive XSLT stylesheet

The following XSLT stylesheet gets the GeneID from the mRNA, download the XML document describing this gene and extracts the identifiers for Gene Ontology :
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xsl:stylesheet version="1.0" xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>

<xsl:output method="text" encoding="ISO-8859-1"/>

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="Dbtag[Dbtag_db='GeneID']">
<xsl:for-each select="Dbtag_tag/Object-id/Object-id_id">
<xsl:variable name="genid" select="."/>
<xsl:variable name="url" select="concat('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&retmode=xml&id=',$genid)"/>
<xsl:apply-templates select="document($url,//Other-source)" mode="go"/>
</xsl:for-each>
</xsl:template>

<xsl:template match="Other-source" mode="go">
<xsl:if test="Other-source_src/Dbtag/Dbtag_db='GO'">
<xsl:value-of select="concat('GO:',Other-source_src/Dbtag/Dbtag_tag/Object-id/Object-id_id)"/>
<xsl:text> </xsl:text>
<xsl:value-of select="Other-source_anchor"/>
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:template>

<xsl:template match="text()">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="text()" mode="go">
</xsl:template>

</xsl:stylesheet>

Running the stylesheet with xsltproc

Result:

GO:5524 ATP binding
GO:8026 ATP-dependent helicase activity
GO:3725 double-stranded RNA binding
GO:4519 endonuclease activity
GO:4386 helicase activity
GO:16787 hydrolase activity
GO:46872 metal ion binding
GO:166 nucleotide binding
GO:5515 protein binding
GO:4525 ribonuclease III activity
GO:6396 RNA processing
GO:1525 angiogenesis
GO:48754 branching morphogenesis of a tube
GO:35116 embryonic hindlimb morphogenesis
GO:31047 gene silencing by RNA
GO:30324 lung development
GO:31054 pre-microRNA processing
GO:30422 production of siRNA involved in RNA interference
GO:19827 stem cell maintenance
GO:30423 targeting of mRNA for destruction involved in RNA interference
GO:16442 RNA-induced silencing complex
GO:5737 cytoplasm
GO:5622 intracellular

2nd solution: using the ensembl.org mysql server

After doing some little reverse engineering with the SQL schema of ensembl.org, I wrote the following SQL query:
use homo_sapiens_core_48_36j;

select distinct
GENE_ID.stable_id as "ensembl.gene",
RNA_ID.stable_id as "ensembl.transcript",
PROT_ID.stable_id as "ensembl.translation",
GO.acc as "go.acc",
GO.name as "go.name",
GOXREF.linkage_type as "evidence"
from

ensembl_go_54.term as GO,
external_db as EXTDB0,
external_db as EXTDB1,
object_xref as OX0,
object_xref as OX1,
xref as XREF0,
xref as XREF1,
transcript as RNA,
transcript_stable_id as RNA_ID,
gene as GENE,
gene_stable_id as GENE_ID,
translation as PROT,
translation_stable_id as PROT_ID,
go_xref as GOXREF
where
XREF0.dbprimary_acc="NM_030621" and
XREF0.external_db_id=EXTDB0.external_db_id and
EXTDB0.db_name="RefSeq_dna" and
OX0.xref_id=XREF0.xref_id and
RNA.gene_id=GENE.gene_id and
GENE.gene_id= GENE_ID.gene_id and
RNA.transcript_id=OX0.ensembl_id and
RNA_ID.transcript_id=RNA.transcript_id and
PROT.transcript_id = RNA.transcript_id and
OX1.ensembl_id=PROT.translation_id and
PROT.translation_id=PROT_ID.translation_id and
OX1.ensembl_object_type='Translation' and
OX1.xref_id=XREF1.xref_id and
GOXREF.object_xref_id=OX1.object_xref_id and
XREF1.external_db_id=EXTDB1.external_db_id and
EXTDB1.db_name="GO" and
GO.acc=XREF1.dbprimary_acc

order by GO.acc;

Execute

mysql -A -h ensembldb.ensembl.org -u anonymous -P 5306 < query.sql

Result

ensembl.geneensembl.transcriptensembl.translationgo.accgo.name
ENSG00000100697ENST00000393063ENSP00000376783GO:0000166nucleotide binding
ENSG00000100697ENST00000393063ENSP00000376783GO:0001525angiogenesis
ENSG00000100697ENST00000393063ENSP00000376783GO:0003676nucleic acid binding
ENSG00000100697ENST00000393063ENSP00000376783GO:0003677DNA binding
ENSG00000100697ENST00000393063ENSP00000376783GO:0003723RNA binding
ENSG00000100697ENST00000393063ENSP00000376783GO:0003725double-stranded RNA binding
ENSG00000100697ENST00000393063ENSP00000376783GO:0004386helicase activity
ENSG00000100697ENST00000393063ENSP00000376783GO:0004519endonuclease activity
ENSG00000100697ENST00000393063ENSP00000376783GO:0004525ribonuclease III activity
ENSG00000100697ENST00000393063ENSP00000376783GO:0005515protein binding
ENSG00000100697ENST00000393063ENSP00000376783GO:0005524ATP binding
ENSG00000100697ENST00000393063ENSP00000376783GO:0005622intracellular
ENSG00000100697ENST00000393063ENSP00000376783GO:0006396RNA processing
ENSG00000100697ENST00000393063ENSP00000376783GO:0008026ATP-dependent helicase activity
ENSG00000100697ENST00000393063ENSP00000376783GO:0016787hydrolase activity
ENSG00000100697ENST00000393063ENSP00000376783GO:0019827stem cell maintenance
ENSG00000100697ENST00000393063ENSP00000376783GO:0030324lung development
ENSG00000100697ENST00000393063ENSP00000376783GO:0030422RNA interference, production of siRNA
ENSG00000100697ENST00000393063ENSP00000376783GO:0030423RNA interference, targeting of mRNA for destruction
ENSG00000100697ENST00000393063ENSP00000376783GO:0031047gene silencing by RNA
ENSG00000100697ENST00000393063ENSP00000376783GO:0035116embryonic hindlimb morphogenesis
ENSG00000100697ENST00000393063ENSP00000376783GO:0035196gene silencing by miRNA, production of miRNAs
ENSG00000100697ENST00000393063ENSP00000376783GO:0048754branching morphogenesis of a tube


Note: my solutions don't list all the GO terms visible in http://www.ensembl.org/Homo_sapiens/Transcript/(...)/ENSG00000100697. E.g.: I can see GO:0016442 on the web page, but not in my result ). The ensembl schema is complex, and I might have missed some links ("left join" ?) between the tables (anybody knows ?).UPDATE:The results should be the same as ensembl when using "homo_sapiens_core_58_37c" (Thank you @joachimbaran. )

That's it

Pierre

19 September 2009

XSLT+MySQL=Append GeneOntology terms to TinySeq

In my previous post I showed how to add a new functions to the Xalan XSLT engine. In this post I'll show how to connect to a mysql server via Xalan. A TinySeq XML will be transformed with XALAN and a XSLT stylesheet querying the GeneOntology public mysql server. This stylesheet will search the GO terms for each sequence.

The TinySeq Sequences

The sequences were downloaded from the NCBI.
<TSeqSet>
<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>124617</TSeq_gi>
<TSeq_accver>P01308.1</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>RecName: Full=Insulin; Contains: RecName: Full=Insulin B chain; Contains: RecName: Full=Insulin A chain; Flags: Precursor</TSeq_defline>
<TSeq_length>110</TSeq_length>
<TSeq_sequence>MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN</TSeq_sequence>
</TSeq>

<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>3183544</TSeq_gi>
<TSeq_accver>P11940.2</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>RecName: Full=Polyadenylate-binding protein 1; Short=Poly(A)-binding protein 1; Short=PABP 1</TSeq_defline>
<TSeq_length>636</TSeq_length>
<TSeq_sequence>MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQQPADAERALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAKEFTNVYIKNFGEDMDDERLKDLFGKFGPALSVKVMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFEQMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAHLTNQYMQRMASVRAVPNPVINPYQPAPPSGYFMAAIPQTQNRAAYYPPSQIAQLRPSPRWTAQGARPHPFQNMPGAIRPAAPRPPFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAATPAVRTVPQYKYAAGVRNPQQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPLIQAMHPTLAGKITGMLLEIDNSELLHMLESPESLRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV</TSeq_sequence>
</TSeq>
</TSeqSet>

The stylesheet


In the header, the mysql extension for XALAN is declared:
<xsl:stylesheet
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version='1.0'
xmlns:sql="org.apache.xalan.lib.sql.XConnection"
extension-element-prefixes="sql"
>

A few variables are required to define the mysql connection:

<xsl:param name="driver" select="'com.mysql.jdbc.Driver'"/>
<xsl:param name="datasource" select="'jdbc:mysql://mysql.ebi.ac.uk:4085/go_latest'"/>
<xsl:param name="query" select="'SELECT * FROM chromInfo limit 10'"/>
<xsl:param name="passwd" select="'amigo'"/>
<xsl:param name="username" select="'go_select'"/>

A new SQL object is created:
<xsl:variable name="db" select="sql:new()"/>

A new connection is created when the document is parsed.This connection is released at the end.
<xsl:template match="/">
<xsl:if test="not(sql:connect($db, $driver, $datasource, $username, $passwd))" >
<xsl:copy-of select="sql:getError($db)/ext-error" />
<xsl:message terminate="yes">Error Connecting to the Database</xsl:message>
</xsl:if>
<xsl:apply-templates/>
<xsl:value-of select="sql:close($db)"/>
</xsl:template>

Each time a TSeq is found, a new SQL query is built. I'm not a specialist of GO, I hope the query is OK.
<xsl:variable name="sql">
select distinct
term.acc as "termAcc",
term.name as "termName",
term.term_type as "termType"
from
dbxref,
term,
association,
gene_product,
species
where
association.term_id=term.id and
gene_product.dbxref_id=dbxref.id and
gene_product.id=association.gene_product_id and
gene_product.species_id=species.id and
term.is_obsolete=0 and
dbxref.xref_key="<xsl:value-of select="$xref_key"/>" and
species.ncbi_taxa_id=<xsl:value-of select="TSeq_taxid"/>
</xsl:variable>

The query is sent to the mysql server.
<xsl:variable name="table" select='sql:query($db, $sql)'/>

And the SQL result is processed as a regular stylesheet
<xsl:apply-templates select="$table" mode="sql"/>

Complete source code for the stylesheet:
<xsl:stylesheet version="1.0" extension-element-prefixes="sql"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
xmlns:sql="org.apache.xalan.lib.sql.XConnection"
>

<xsl:output method="xml" indent="yes"/>

<xsl:param name="driver" select="'com.mysql.jdbc.Driver'"/>
<xsl:param name="datasource" select="'jdbc:mysql://mysql.ebi.ac.uk:4085/go_latest'"/>
<xsl:param name="query" select="'SELECT * FROM chromInfo limit 10'"/>
<xsl:param name="passwd" select="'amigo'"/>
<xsl:param name="username" select="'go_select'"/>

<xsl:variable name="db" select="sql:new()"/>



<xsl:template match="/">
<xsl:if test="not(sql:connect($db, $driver, $datasource, $username, $passwd))">
<xsl:copy-of select="sql:getError($db)/ext-error"/>
<xsl:message terminate="yes">Error Connecting to the Database</xsl:message>
</xsl:if>
<xsl:apply-templates/>
<xsl:value-of select="sql:close($db)"/>
</xsl:template>

<xsl:template match="TSeq">
<xsl:element name="TSeq">
<xsl:apply-templates/>
<xsl:if test="TSeq_seqtype/@value='protein' and TSeq_accver and TSeq_taxid">
<xsl:variable name="xref_key">
<xsl:choose>
<xsl:when test="contains(TSeq_accver,'.')">
<xsl:value-of select="substring-before(TSeq_accver,'.')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="TSeq_accver"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:variable name="sql">
select distinct
term.acc as "termAcc",
term.name as "termName",
term.term_type as "termType"
from
dbxref,
term,
association,
gene_product,
species
where
association.term_id=term.id and
gene_product.dbxref_id=dbxref.id and
gene_product.id=association.gene_product_id and
gene_product.species_id=species.id and
term.is_obsolete=0 and
dbxref.xref_key="
<xsl:value-of select="$xref_key"/>" and
species.ncbi_taxa_id=
<xsl:value-of select="TSeq_taxid"/>
</xsl:variable>

<xsl:variable name="table" select="sql:query($db, $sql)"/>

<xsl:apply-templates select="$table" mode="sql"/>
</xsl:if>
</xsl:element>
</xsl:template>


<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>



<xsl:template match="sql" mode="sql">
<xsl:if test="count(row-set)>0">
<GeneOntology>
<xsl:apply-templates select="row-set/row" mode="sql"/>
</GeneOntology>
</xsl:if>
</xsl:template>

<xsl:template match="row" mode="sql">
<xsl:element name="GoTerm">
<xsl:attribute name="src">http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=<xsl:value-of select="col[@column-label='termAcc']"/></xsl:attribute>
<acn><xsl:value-of select="col[@column-label='termAcc']"/></acn>
<name><xsl:value-of select="col[@column-label='termName']"/></name>
<type><xsl:value-of select="col[@column-label='termType']"/></type>
</xsl:element>
</xsl:template>

</xsl:stylesheet>


Applying the stylesheet


The jar containing the Mysql driver is added to the CLASSPATH
java -cp ${XALAN}/org.apache.xalan_2.7.1.v200905122109.jar:\
${XALAN}//org.apache.xml.serializer_2.7.1.v200902170519.jar:\
mysql-connector-java.jar \
org.apache.xalan.xslt.Process -IN sequences.fasta.xml -XSL tinyseq2go.xsl

Result


<TSeqSet>
<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>124617</TSeq_gi>
<TSeq_accver>P01308.1</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>RecName: Full=Insulin; Contains: RecName: Full=Insulin B chain; Contains: RecName: Full=Insulin A chain; Flags: Precursor</TSeq_defline>
<TSeq_length>110</TSeq_length>
<TSeq_sequence>MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN</TSeq_sequence>
<GeneOntology>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045721">
<acn>GO:0045721</acn>
<name>negative regulation of gluconeogenesis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0030307">
<acn>GO:0030307</acn>
<name>positive regulation of cell growth</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045597">
<acn>GO:0045597</acn>
<name>positive regulation of cell differentiation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0046889">
<acn>GO:0046889</acn>
<name>positive regulation of lipid biosynthetic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0050995">
<acn>GO:0050995</acn>
<name>negative regulation of lipid catabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045725">
<acn>GO:0045725</acn>
<name>positive regulation of glycogen biosynthetic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0032148">
<acn>GO:0032148</acn>
<name>activation of protein kinase B activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0050731">
<acn>GO:0050731</acn>
<name>positive regulation of peptidyl-tyrosine phosphorylation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0031954">
<acn>GO:0031954</acn>
<name>positive regulation of protein amino acid autophosphorylation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006469">
<acn>GO:0006469</acn>
<name>negative regulation of protein kinase activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0043066">
<acn>GO:0043066</acn>
<name>negative regulation of apoptosis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008284">
<acn>GO:0008284</acn>
<name>positive regulation of cell proliferation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0051897">
<acn>GO:0051897</acn>
<name>positive regulation of protein kinase B signaling cascade</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0030335">
<acn>GO:0030335</acn>
<name>positive regulation of cell migration</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005615">
<acn>GO:0005615</acn>
<name>extracellular space</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045922">
<acn>GO:0045922</acn>
<name>negative regulation of fatty acid metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0042060">
<acn>GO:0042060</acn>
<name>wound healing</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0022898">
<acn>GO:0022898</acn>
<name>regulation of transmembrane transporter activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0046326">
<acn>GO:0046326</acn>
<name>positive regulation of glucose import</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0007186">
<acn>GO:0007186</acn>
<name>G-protein coupled receptor protein signaling pathway</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0032583">
<acn>GO:0032583</acn>
<name>regulation of gene-specific transcription</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0060266">
<acn>GO:0060266</acn>
<name>negative regulation of respiratory burst during acute inflammatory response</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006521">
<acn>GO:0006521</acn>
<name>regulation of cellular amino acid metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0032270">
<acn>GO:0032270</acn>
<name>positive regulation of cellular protein metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045861">
<acn>GO:0045861</acn>
<name>negative regulation of proteolysis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006953">
<acn>GO:0006953</acn>
<name>acute-phase response</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0050709">
<acn>GO:0050709</acn>
<name>negative regulation of protein secretion</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0033861">
<acn>GO:0033861</acn>
<name>negative regulation of NAD(P)H oxidase activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0055089">
<acn>GO:0055089</acn>
<name>fatty acid homeostasis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005159">
<acn>GO:0005159</acn>
<name>insulin-like growth factor receptor binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0014068">
<acn>GO:0014068</acn>
<name>positive regulation of phosphoinositide 3-kinase cascade</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045740">
<acn>GO:0045740</acn>
<name>positive regulation of DNA replication</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045821">
<acn>GO:0045821</acn>
<name>positive regulation of glycolysis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0046628">
<acn>GO:0046628</acn>
<name>positive regulation of insulin receptor signaling pathway</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0042593">
<acn>GO:0042593</acn>
<name>glucose homeostasis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045818">
<acn>GO:0045818</acn>
<name>negative regulation of glycogen catabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0070201">
<acn>GO:0070201</acn>
<name>regulation of establishment of protein localization</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045840">
<acn>GO:0045840</acn>
<name>positive regulation of mitosis</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0043410">
<acn>GO:0043410</acn>
<name>positive regulation of MAPKKK cascade</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006006">
<acn>GO:0006006</acn>
<name>glucose metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005975">
<acn>GO:0005975</acn>
<name>carbohydrate metabolic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0046631">
<acn>GO:0046631</acn>
<name>alpha-beta T cell activation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008219">
<acn>GO:0008219</acn>
<name>cell death</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0015758">
<acn>GO:0015758</acn>
<name>glucose transport</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0050715">
<acn>GO:0050715</acn>
<name>positive regulation of cytokine secretion</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045909">
<acn>GO:0045909</acn>
<name>positive regulation of vasodilation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045429">
<acn>GO:0045429</acn>
<name>positive regulation of nitric oxide biosynthetic process</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0051000">
<acn>GO:0051000</acn>
<name>positive regulation of nitric-oxide synthase activity</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0045908">
<acn>GO:0045908</acn>
<name>negative regulation of vasodilation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005179">
<acn>GO:0005179</acn>
<name>hormone activity</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005158">
<acn>GO:0005158</acn>
<name>insulin receptor binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005576">
<acn>GO:0005576</acn>
<name>extracellular region</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005515">
<acn>GO:0005515</acn>
<name>protein binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005520">
<acn>GO:0005520</acn>
<name>insulin-like growth factor binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0007267">
<acn>GO:0007267</acn>
<name>cell-cell signaling</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0060267">
<acn>GO:0060267</acn>
<name>positive regulation of respiratory burst</name>
<type>biological_process</type>
</GoTerm>
</GeneOntology>

</TSeq>

<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>3183544</TSeq_gi>
<TSeq_accver>P11940.2</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>RecName: Full=Polyadenylate-binding protein 1; Short=Poly(A)-binding protein 1; Short=PABP 1</TSeq_defline>
<TSeq_length>636</TSeq_length>
<TSeq_sequence>MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQQPADAERALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAKEFTNVYIKNFGEDMDDERLKDLFGKFGPALSVKVMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFEQMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAHLTNQYMQRMASVRAVPNPVINPYQPAPPSGYFMAAIPQTQNRAAYYPPSQIAQLRPSPRWTAQGARPHPFQNMPGAIRPAAPRPPFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAATPAVRTVPQYKYAAGVRNPQQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPLIQAMHPTLAGKITGMLLEIDNSELLHMLESPESLRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV</TSeq_sequence>
<GeneOntology>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0000166">
<acn>GO:0000166</acn>
<name>nucleotide binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0003723">
<acn>GO:0003723</acn>
<name>RNA binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005829">
<acn>GO:0005829</acn>
<name>cytosol</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005634">
<acn>GO:0005634</acn>
<name>nucleus</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005681">
<acn>GO:0005681</acn>
<name>spliceosomal complex</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008380">
<acn>GO:0008380</acn>
<name>RNA splicing</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008143">
<acn>GO:0008143</acn>
<name>poly(A) RNA binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006378">
<acn>GO:0006378</acn>
<name>mRNA polyadenylation</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0005737">
<acn>GO:0005737</acn>
<name>cytoplasm</name>
<type>cellular_component</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0048255">
<acn>GO:0048255</acn>
<name>mRNA stabilization</name>
<type>biological_process</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008494">
<acn>GO:0008494</acn>
<name>translation activator activity</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0008022">
<acn>GO:0008022</acn>
<name>protein C-terminus binding</name>
<type>molecular_function</type>
</GoTerm>
<GoTerm src="http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0006397">
<acn>GO:0006397</acn>
<name>mRNA processing</name>
<type>biological_process</type>
</GoTerm>
</GeneOntology>
</TSeq>
</TSeqSet>


Hey, I think it's cool ! :-)

That's it
Pierre