I've used this API to find the partners of a protein at two degrees of separations, here is my notebook:
First download the network for each protein (Note : the database is also available for download) using their HTTP-based API: e.g.: http://string.embl.de/api/psi-mi/interactions?identifier=Roxan. The Ensembl gene ID seems to be the more efficient (non ambiguous) identifiers (e.g. http://string.embl.de/api/psi-mi/interactions?identifier=ENSP00000263243). Note that the STRING database is available for download.
I also wrote a basic XSLT stylesheet transforming the PSI/XML to graphiz-dot format. The stylesheet is available here: http://code.google.com/p/lindenb/source/browse/trunk/src/xsl/psi2dot.xslt. e.g:
xsltproc psi2dot.xslt ROXAN.xml | dot -opicture.png -Tpng
Another XSLT stylesheet (psi2sql.xslt creates the statements to insert one or more psi file into a mysql database ).
xsltproc --stringparam temporary "" psi2sql.xslt interaction1.xml | mysql -u login --password=password -D database -N
xsltproc --stringparam temporary "" psi2sql.xslt interaction2.xml | mysql -u login --password=password -D database -N
xsltproc --stringparam temporary "" psi2sql.xslt interaction3.xml | mysql -u login --password=password -D database -N
xsltproc --stringparam temporary "" psi2sql.xslt interaction2.xml | mysql -u login --password=password -D database -N
xsltproc --stringparam temporary "" psi2sql.xslt interaction3.xml | mysql -u login --password=password -D database -N
The parameter temporary is an argument for the stylesheet telling mysql not to work with temporary tables.
Two of the tables created (interactions and interactors) are described below:
mysql> desc interactor;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| pk | varchar(50) | NO | UNI | NULL | |
| shortLabel | varchar(255) | YES | | NULL | |
| fullName | text | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
mysql> desc interaction;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| interactor1_id | int(11) | NO | MUL | NULL | |
| interactor2_id | int(11) | NO | MUL | NULL | |
| unitLabel | varchar(50) | YES | | NULL | |
| unitFullName | varchar(100) | YES | | NULL | |
| confidence | float | YES | | NULL | |
| experiment_id | int(11) | NO | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| pk | varchar(50) | NO | UNI | NULL | |
| shortLabel | varchar(255) | YES | | NULL | |
| fullName | text | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
mysql> desc interaction;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| interactor1_id | int(11) | NO | MUL | NULL | |
| interactor2_id | int(11) | NO | MUL | NULL | |
| unitLabel | varchar(50) | YES | | NULL | |
| unitFullName | varchar(100) | YES | | NULL | |
| confidence | float | YES | | NULL | |
| experiment_id | int(11) | NO | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
And here are the mysql statements finding the protein linked to EIF4G1 at two degrees of separation:
create a temporary table containing a the 2-deg interactions.
create temporary table t1
(
id1 int,
id2 int,
id3 int
);
insert into t1(id1,id2,id3)
select distinct
P1.id,P2.id,P3.id
from
interactor as P1,
interactor as P2,
interactor as P3,
interaction as I1,
interaction as I2
where
P1.shortLabel="EIF4G1" and
P3.shortLabel!="EIF4G1" and
((P1.id= I1.interactor1_id AND P2.id= I1.interactor2_id) or (P2.id= I1.interactor1_id AND P1.id= I1.interactor2_id)) and
((P2.id= I2.interactor1_id and P3.id= I2.interactor2_id) or (P3.id= I2.interactor1_id and P2.id= I2.interactor2_id))
;
(
id1 int,
id2 int,
id3 int
);
insert into t1(id1,id2,id3)
select distinct
P1.id,P2.id,P3.id
from
interactor as P1,
interactor as P2,
interactor as P3,
interaction as I1,
interaction as I2
where
P1.shortLabel="EIF4G1" and
P3.shortLabel!="EIF4G1" and
((P1.id= I1.interactor1_id AND P2.id= I1.interactor2_id) or (P2.id= I1.interactor1_id AND P1.id= I1.interactor2_id)) and
((P2.id= I2.interactor1_id and P3.id= I2.interactor2_id) or (P3.id= I2.interactor1_id and P2.id= I2.interactor2_id))
;
Remove the simple interactions from the temporary table:
delete t1 from
t1,
interactor as P1,
interactor as P3,
interaction as I1
where
((t1.id1=P1.id and t1.id3=P3.id) or (t1.id1=P3.id and t1.id3=P1.id)) and
((P1.id= I1.interactor1_id and P3.id= I1.interactor2_id) or (P3.id= I1.interactor1_id and P1.id= I1.interactor2_id))
;
t1,
interactor as P1,
interactor as P3,
interaction as I1
where
((t1.id1=P1.id and t1.id3=P3.id) or (t1.id1=P3.id and t1.id3=P1.id)) and
((P1.id= I1.interactor1_id and P3.id= I1.interactor2_id) or (P3.id= I1.interactor1_id and P1.id= I1.interactor2_id))
;
And dump the results:
select
P1.shortLabel as "Partner1",
P2.shortLabel as "Partner2",
P3.shortLabel as "Partner3"
from
t1,
interactor as P1,
interactor as P2,
interactor as P3
where
t1.id1 = P1.id
and
t1.id2 = P2.id
and
t1.id3=P3.id
;
P1.shortLabel as "Partner1",
P2.shortLabel as "Partner2",
P3.shortLabel as "Partner3"
from
t1,
interactor as P1,
interactor as P2,
interactor as P3
where
t1.id1 = P1.id
and
t1.id2 = P2.id
and
t1.id3=P3.id
;
Here is the result:
Partner1 Partner2 Partner3
EIF4G1 ZC3H7B HMGB1
EIF4G1 ZC3H7B KCTD12
EIF4G1 ZC3H7B FGB
EIF4G1 ZC3H7B GLUD1
EIF4G1 ZC3H7B PDGFRA
EIF4G1 ZC3H7B PXN
EIF4G1 ZC3H7B HMGB1
EIF4G1 ZC3H7B KCTD12
EIF4G1 ZC3H7B FGB
EIF4G1 ZC3H7B GLUD1
EIF4G1 ZC3H7B PDGFRA
EIF4G1 ZC3H7B PXN
That's it
Pierre
No comments:
Post a Comment