Showing posts with label diagram. Show all posts
Showing posts with label diagram. Show all posts

23 February 2011

Creating a custom shape for DIA: my notebook.


In the this post, I'll describe how to create a custom shape for DIA, a diagram creation program.
  • The logo of the NCBI (149x183) was downloaded from commons.wikimedia.org.
  • Inkscape was used to transform this picture to a vectorial drawing (SVG).
  • In ${HOME}/.dia/sheet, I wrote the following sheet describing the package Bio and containing one object named "Bio - NCBI"
    <?xml version="1.0" encoding="iso-8859-1"?>
    <sheet xmlns="http://www.lysator.liu.se/~alla/dia/dia-sheet-ns">
    <name>Bio</name>
    <description>Bio</description>
    <contents>
    <object name="Bio - NCBI">
    <description>Shape</description>
    </object>
    </contents>
    </sheet>

  • I moved the icon ncbi.png to ${HOME}/.dia/shapes/ncbi.png,
  • The object "Bio - NCBI" was defined in ${HOME}/.dia/shapes/ncbi.shape. This object contains the SVG drawing and defines 8 points for connecting the objects.
    <?xml version="1.0" encoding="UTF-8" ?>
    <shape xmlns="http://www.daa.com.au/~james/dia-shape-ns" xmlns:svg="http://www.w3.org/2000/svg">
    <name>Bio - NCBI</name>
    <description>NCBI</description>
    <icon>ncbi.png</icon>
    <connections>
    <point x="0" y="0"/>
    <point x="149" y="0"/>
    <point x="149" y="183"/>
    <point x="0" y="183"/>
    <point x="0" y="91"/>
    <point x="149" y="91"/>
    <point x="74" y="0"/>
    <point x="74" y="183"/>
    </connections>
    <aspectratio type="fixed"/>
    <textbox x1="0" y1="0" x2="149" y2="183"/>
    <svg:svg version="1.0" width="149" height="183" style="stroke-width:0.1;">
    <svg:rect x="0" y="0" width="149" height="183" fill="#326598" stroke="black"/>
    <svg:path fill-rule="nonzero"
    d="M 32.313683,145 L 30.030665,145 L 30.545674,138.75 C 30.828929,135.3125 31.265318,130.22805 31.515428,127.45121 L 31.970174,122.40243 L 30.428383,120.20121 L 28.886592,118 L 31.717493,118 L 34.548394,118 L 38.811071,123.75 C 41.155544,126.9125 45.126964,131.75 47.636449,134.5 L 52.19915,139.5 L 51.763672,128.75 L 51.328193,118 L 53.08395,118 L 54.839708,118 L 54.291101,127.75 C 53.989367,133.1125 53.450636,139.29908 53.093921,141.49795 L 52.445348,145.4959 L 42.972674,134.37203 L 33.5,123.24817 L 33.208078,125.37408 C 33.047521,126.54334 33.294279,131.4375 33.756429,136.25 L 34.596701,145 L 32.313683,145 z M 74.569665,144.99626 L 70.5,144.99252 L 67.366714,143.08194 C 65.643407,142.03112 63.505907,140.13267 62.616714,138.86317 L 61,136.55499 L 61,131.5 L 61,126.44501 L 62.704109,123.97251 C 63.641368,122.61263 65.103868,120.9448 65.954109,120.26622 C 66.804349,119.58765 69.23818,118.54415 71.362623,117.94735 L 75.225246,116.86226 L 78.931032,117.48835 L 82.636819,118.11444 L 82.196448,121.11487 L 81.756078,124.1153 L 79.839084,122.05765 L 77.92209,120 L 73.017749,120 L 68.113408,120 L 66.556704,122.22251 L 65,124.44501 L 65,130.08614 L 65,135.72727 L 67.636364,138.36364 L 70.272727,141 L 74.702105,141 L 79.131483,141 L 81.065741,139.96482 C 82.129584,139.39547 83,139.16704 83,139.45721 C 83,139.74738 82.018849,141.11321 80.819665,142.49239 L 78.639329,145 L 74.569665,144.99626 z M 94.5,144.99626 L 88.5,145 L 88.5,131.60088 L 88.5,118.20176 L 94.125,117.56537 L 99.75,116.92897 L 102.29025,118.08638 L 104.83049,119.2438 L 105.3639,121.28357 L 105.89732,123.32335 L 105.00326,124.99392 C 104.51152,125.91273 103.39759,127.25506 102.52786,127.97688 L 100.94653,129.28927 L 104.00685,131.15036 L 107.06716,133.01145 L 107.64143,134.50796 L 108.21569,136.00447 L 107.58101,138.53326 L 106.94633,141.06204 L 103.72316,143.02728 L 100.5,144.99252 L 94.5,144.99626 z M 98.55,142.92105 L 101.6,143 L 102.8,141.8 L 104,140.6 L 104,137.87258 L 104,135.14516 L 101.36514,133.07258 L 98.73028,131 L 95.86514,131 L 93,131 L 93,136.41667 L 93,141.83333 L 94.25,142.33772 C 94.9375,142.61513 96.8725,142.87763 98.55,142.92105 z M 95.9433,130 L 98.88659,130 L 100.4433,127.77749 C 101.29948,126.55512 102,124.98012 102,124.27749 C 102,123.57487 101.1,122.1 100,121 L 98,119 L 95.5,119 L 93,119 L 93,124.5 L 93,130 L 95.9433,130 z M 115.9613,145.00071 L 113.42259,145 L 114.02313,138.25 L 114.62367,131.5 L 114.03912,124.75 L 113.45457,118 L 115.97729,118.00113 L 118.5,118.00225 L 118.5,131.50184 L 118.5,145.00143 L 115.9613,145.00071 z M 76.75,101.46083 L 73,102.09063 L 73,97.564366 L 73,93.038106 L 75.25,92.582276 C 76.4875,92.331566 79.975,91.875246 83,91.568236 C 86.025,91.261226 90.21672,90.528556 92.31493,89.940086 L 96.12987,88.870136 L 98.06493,86.935066 L 100,85 L 100,83.328583 L 100,81.657166 L 95.41708,77.265957 L 90.83416,72.874747 L 94.88928,74.943513 C 97.11959,76.081334 100.5319,78.406133 102.4722,80.109732 L 106,83.207184 L 106,85.669334 C 106,87.023516 105.54172,88.987776 104.98161,90.034366 L 103.96322,91.937246 L 99.73161,94.409336 L 95.5,96.881426 L 88,98.856226 C 83.875,99.942366 78.8125,101.11444 76.75,101.46083 z M 56.5,85 C 56.225,85 56,84.775 56,84.5 C 56,84.225 56.225,84 56.5,84 C 56.775,84 57,84.225 57,84.5 C 57,84.775 56.775,85 56.5,85 z M 54.885619,84 C 54.613887,84 52.271243,82.655334 49.679742,81.011853 L 44.967924,78.023706 L 43.857046,75.585595 L 42.746168,73.147483 L 43.383017,70.610074 L 44.019866,68.072665 L 48.259933,65.595618 L 52.5,63.11857 L 60,61.089291 L 67.5,59.060011 L 77,57.97532 C 82.225,57.378739 88.83491,56.235687 91.68868,55.435205 L 96.87736,53.979783 L 98.43868,52.418463 L 100,50.857143 L 100,48.773735 L 100,46.690328 L 95.75,42.646974 L 91.5,38.603621 L 96.84169,41.510641 L 102.18337,44.417662 L 104.09169,46.843689 L 106,49.269717 L 106,52.062055 L 106,54.854392 L 102.25,57.955856 L 98.5,61.05732 L 90.7182,63.522397 L 82.936408,65.987474 L 71.089745,67.528685 C 64.574081,68.376351 57.776383,69.49054 55.983749,70.004661 C 54.191116,70.518781 51.886422,71.697799 50.862208,72.6247 L 49,74.309975 L 49,76.03633 L 49,77.762685 L 52.189838,80.881343 C 53.944249,82.596604 55.15735,84 54.885619,84 z M 89.5,73 C 89.225,73 89,72.775 89,72.5 C 89,72.225 89.225,72 89.5,72 C 89.775,72 90,72.225 90,72.5 C 90,72.775 89.775,73 89.5,73 z M 58.5,53 C 58.225,53 58,52.775 58,52.5 C 58,52.225 58.225,52 58.5,52 C 58.775,52 59,52.225 59,52.5 C 59,52.775 58.775,53 58.5,53 z M 56.929999,51.990616 C 56.693499,51.985455 54.183802,50.652796 51.352893,49.029152 L 46.205786,46.077073 L 44.102893,43.403678 L 42,40.730283 L 42,39.053703 L 42,37.377123 L 44.038155,35.18942 C 45.159141,33.986183 47.850527,32.09658 50.019013,30.990301 L 53.961714,28.978884 L 62.230857,26.996448 C 66.778886,25.906108 71.5125,25.010859 72.75,25.007006 L 75,25 L 75,29.338975 L 75,33.67795 L 65.916942,34.796298 L 56.833885,35.914646 L 53.166942,37.707323 L 49.5,39.5 L 49.195591,42.125505 L 48.891182,44.751009 L 53.12559,48.375505 C 55.454514,50.368977 57.166498,51.995777 56.929999,51.990616 z M 90.5,39 C 90.225,39 90,38.775 90,38.5 C 90,38.225 90.225,38 90.5,38 C 90.775,38 91,38.225 91,38.5 C 91,38.775 90.775,39 90.5,39 z"
    fill="white" stroke="none" />


    </svg:svg>
    </shape>
  • run dia ...



Et voila!


That's it,

Pierre

29 October 2008

EMBL/Strings: find interactors at 2 degrees of separation my notebook.

Thank (again) to the Life Scientists on FriendFeed I've discoreved the API of STRING8 ( STRING 8—a global view on proteins and their functional interactions in 630 organisms NAR 2008): STRING is a database and web resource dedicated to protein–protein interactions, including both physical and functional interactions..


I've used this API to find the partners of a protein at two degrees of separations, here is my notebook:
First download the network for each protein (Note : the database is also available for download) using their HTTP-based API: e.g.: http://string.embl.de/api/psi-mi/interactions?identifier=Roxan. The Ensembl gene ID seems to be the more efficient (non ambiguous) identifiers (e.g. http://string.embl.de/api/psi-mi/interactions?identifier=ENSP00000263243). Note that the STRING database is available for download.

I also wrote a basic XSLT stylesheet transforming the PSI/XML to graphiz-dot format. The stylesheet is available here: http://code.google.com/p/lindenb/source/browse/trunk/src/xsl/psi2dot.xslt. e.g:

xsltproc psi2dot.xslt ROXAN.xml | dot -opicture.png -Tpng



Another XSLT stylesheet (psi2sql.xslt creates the statements to insert one or more psi file into a mysql database ).
xsltproc --stringparam temporary "" psi2sql.xslt interaction1.xml | mysql -u login --password=password -D database -N
xsltproc --stringparam temporary "" psi2sql.xslt interaction2.xml | mysql -u login --password=password -D database -N
xsltproc --stringparam temporary "" psi2sql.xslt interaction3.xml | mysql -u login --password=password -D database -N

The parameter temporary is an argument for the stylesheet telling mysql not to work with temporary tables.

Two of the tables created (interactions and interactors) are described below:
mysql> desc interactor;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| pk | varchar(50) | NO | UNI | NULL | |
| shortLabel | varchar(255) | YES | | NULL | |
| fullName | text | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+

mysql> desc interaction;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| interactor1_id | int(11) | NO | MUL | NULL | |
| interactor2_id | int(11) | NO | MUL | NULL | |
| unitLabel | varchar(50) | YES | | NULL | |
| unitFullName | varchar(100) | YES | | NULL | |
| confidence | float | YES | | NULL | |
| experiment_id | int(11) | NO | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)



And here are the mysql statements finding the protein linked to EIF4G1 at two degrees of separation:
create a temporary table containing a the 2-deg interactions.
create temporary table t1
(
id1 int,
id2 int,
id3 int
);

insert into t1(id1,id2,id3)
select distinct
P1.id,P2.id,P3.id
from
interactor as P1,
interactor as P2,
interactor as P3,
interaction as I1,
interaction as I2
where
P1.shortLabel="EIF4G1" and
P3.shortLabel!="EIF4G1" and
((P1.id= I1.interactor1_id AND P2.id= I1.interactor2_id) or (P2.id= I1.interactor1_id AND P1.id= I1.interactor2_id)) and
((P2.id= I2.interactor1_id and P3.id= I2.interactor2_id) or (P3.id= I2.interactor1_id and P2.id= I2.interactor2_id))
;

Remove the simple interactions from the temporary table:
delete t1 from
t1,
interactor as P1,
interactor as P3,
interaction as I1
where
((t1.id1=P1.id and t1.id3=P3.id) or (t1.id1=P3.id and t1.id3=P1.id)) and
((P1.id= I1.interactor1_id and P3.id= I1.interactor2_id) or (P3.id= I1.interactor1_id and P1.id= I1.interactor2_id))
;


And dump the results:
select
P1.shortLabel as "Partner1",
P2.shortLabel as "Partner2",
P3.shortLabel as "Partner3"
from
t1,
interactor as P1,
interactor as P2,
interactor as P3
where
t1.id1 = P1.id
and
t1.id2 = P2.id
and
t1.id3=P3.id
;


Here is the result:
Partner1 Partner2 Partner3
EIF4G1 ZC3H7B HMGB1
EIF4G1 ZC3H7B KCTD12
EIF4G1 ZC3H7B FGB
EIF4G1 ZC3H7B GLUD1
EIF4G1 ZC3H7B PDGFRA
EIF4G1 ZC3H7B PXN



That's it
Pierre

21 October 2008

Javadoc is not enough: java2dot

I just wrote a tiny tool used to draw a graph for a java hierarchy. The input of the program is a set of jar files and the name of the classes to be displayed.

The source code is available here:

. The information about each class is obtained using the java.lang.reflect API and the classes are dynamically loaded using an URLClassLoader. The output is a DOT file which is then piped into graphiz dot

As an example, the command line below was used to create the hierarchy of the com.hp.hpl.jena.rdf.model.Model.
It was generated using the following command line:
java -jar ./java2dot.jar
Pierre Lindenbaum PhD. pindenbaum@yahoo.fr
Java2Dot : Compiled by lindenb on 2008-10-21 at 17:40:52 in /home/lindenb/src/lindenb/proj/tinytools
-h this screen
-jar <dir0:jar1:jar2:dir1:...> add a jar in the jar list. If directory, will add all the *ar files
-r add a pattern of classes to be ignored.
-i ignore interfaces
-m ignore classes iMplementing interfaces
-d ignore declared-classes (classes with $ in the name)
-o output file

class-1 class-2 ... class-n




java -jar ./java2dot.jar -jar ${JENADIR}/Jena-2.5.6/lib -d com.hp.hpl.jena.rdf.model.Model |\
dot -Tjpeg -ojenamodel.jpeg



Update: A jar is available here http://lindenb.googlecode.com/files/java2dot.jar.

Pierre

13 October 2008

Creating DIA diagrams from mysql via XSLT

During a conversation on FriendFeed about using inkscape and Dia, Chris Lasher asked me if I tried to use inkscape to create diagrams in SVG format. This gave me the idea to have a new/fresh look at Dia and see if I could use it for my self-interest (I should soon manage a mysql database with plenty of tables but I'm missing such schema). Dia (http://www.gnome.org/projects/dia/ ) can be used to draw many different kinds of diagrams. It currently has special objects to help draw entity relationship diagrams, UML diagrams, flowcharts, network diagrams, and many other diagrams.. A Dia diagram is formatted as a gzipped xml file. Today I created a XSLT stylesheet transforming the XML description of a table in mysql to a basic (no layout, no links) diagram in Dia. This stylesheet sql2dia is available here:



Usage:In the following example, I ask for the structure of four tables at the UCSC. Mysql adds a xml declaration after each query so we need to grep -v this header and surround the queries with an extra element:
(echo "<root>";
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e 'desc snp129; desc snpSeq; desc snpArrayAffy5; desc knownGene' -X |\
grep -v "<?xml" ;\
echo "</root>") > /tmp/tmp.xml
xsltproc sql2dia.xsl /tmp/tmp.xml |\
gzip -c > ~/file.dia


And here is a screenshot of the ouput.


That's it

Pierre