29 October 2008

EMBL/Strings: find interactors at 2 degrees of separation my notebook.

Thank (again) to the Life Scientists on FriendFeed I've discoreved the API of STRING8 ( STRING 8—a global view on proteins and their functional interactions in 630 organisms NAR 2008): STRING is a database and web resource dedicated to protein–protein interactions, including both physical and functional interactions..


I've used this API to find the partners of a protein at two degrees of separations, here is my notebook:
First download the network for each protein (Note : the database is also available for download) using their HTTP-based API: e.g.: http://string.embl.de/api/psi-mi/interactions?identifier=Roxan. The Ensembl gene ID seems to be the more efficient (non ambiguous) identifiers (e.g. http://string.embl.de/api/psi-mi/interactions?identifier=ENSP00000263243). Note that the STRING database is available for download.

I also wrote a basic XSLT stylesheet transforming the PSI/XML to graphiz-dot format. The stylesheet is available here: http://code.google.com/p/lindenb/source/browse/trunk/src/xsl/psi2dot.xslt. e.g:

xsltproc psi2dot.xslt ROXAN.xml | dot -opicture.png -Tpng



Another XSLT stylesheet (psi2sql.xslt creates the statements to insert one or more psi file into a mysql database ).
xsltproc --stringparam temporary "" psi2sql.xslt interaction1.xml | mysql -u login --password=password -D database -N
xsltproc --stringparam temporary "" psi2sql.xslt interaction2.xml | mysql -u login --password=password -D database -N
xsltproc --stringparam temporary "" psi2sql.xslt interaction3.xml | mysql -u login --password=password -D database -N

The parameter temporary is an argument for the stylesheet telling mysql not to work with temporary tables.

Two of the tables created (interactions and interactors) are described below:
mysql> desc interactor;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| pk | varchar(50) | NO | UNI | NULL | |
| shortLabel | varchar(255) | YES | | NULL | |
| fullName | text | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+

mysql> desc interaction;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| interactor1_id | int(11) | NO | MUL | NULL | |
| interactor2_id | int(11) | NO | MUL | NULL | |
| unitLabel | varchar(50) | YES | | NULL | |
| unitFullName | varchar(100) | YES | | NULL | |
| confidence | float | YES | | NULL | |
| experiment_id | int(11) | NO | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)



And here are the mysql statements finding the protein linked to EIF4G1 at two degrees of separation:
create a temporary table containing a the 2-deg interactions.
create temporary table t1
(
id1 int,
id2 int,
id3 int
);

insert into t1(id1,id2,id3)
select distinct
P1.id,P2.id,P3.id
from
interactor as P1,
interactor as P2,
interactor as P3,
interaction as I1,
interaction as I2
where
P1.shortLabel="EIF4G1" and
P3.shortLabel!="EIF4G1" and
((P1.id= I1.interactor1_id AND P2.id= I1.interactor2_id) or (P2.id= I1.interactor1_id AND P1.id= I1.interactor2_id)) and
((P2.id= I2.interactor1_id and P3.id= I2.interactor2_id) or (P3.id= I2.interactor1_id and P2.id= I2.interactor2_id))
;

Remove the simple interactions from the temporary table:
delete t1 from
t1,
interactor as P1,
interactor as P3,
interaction as I1
where
((t1.id1=P1.id and t1.id3=P3.id) or (t1.id1=P3.id and t1.id3=P1.id)) and
((P1.id= I1.interactor1_id and P3.id= I1.interactor2_id) or (P3.id= I1.interactor1_id and P1.id= I1.interactor2_id))
;


And dump the results:
select
P1.shortLabel as "Partner1",
P2.shortLabel as "Partner2",
P3.shortLabel as "Partner3"
from
t1,
interactor as P1,
interactor as P2,
interactor as P3
where
t1.id1 = P1.id
and
t1.id2 = P2.id
and
t1.id3=P3.id
;


Here is the result:
Partner1 Partner2 Partner3
EIF4G1 ZC3H7B HMGB1
EIF4G1 ZC3H7B KCTD12
EIF4G1 ZC3H7B FGB
EIF4G1 ZC3H7B GLUD1
EIF4G1 ZC3H7B PDGFRA
EIF4G1 ZC3H7B PXN



That's it
Pierre

No comments: