31 December 2010

Translating a DNA to a Protein using server-side javascript and C: my notebook

In my previous post , I used Node.js to translate a DNA to a protein on the Server-side, using javascript. In the following post, I again will translate a DNAn but this time by calling a specialized C program on the server side.

Source code


The C program

The C program reads a DNA string from stdin a translate it using the standard genetic code:
Compilation:
gcc -o /my/bin/path/translate translate.c

The Node.js script

When the Node.js server receive a DNA parameter, it spawns a new process to the C program and we write the DNA to this process via 'stdin'.
Each time a new 'data' event (containing the protein) is received, it is printed to the http response. At the end of the process, we close the stream by calling 'end()'.

test

> node-v0.2.5/node translate.js
Server running at http://127.0.0.1:8080

> curl -s "http://localhost:8080/?dna=ATGATGATAGATAGATATAGTAGATATGATCGTCAGCCATACG"
MMIDRYSRYDRQPY


That's it,

Pierre

Server-side javascript: translating a DNA with Node.js

(wikipedia) Node.js is an evented I/O framework for the V8 JavaScript engine on Unix-like platforms. It is intended for writing scalable (javascript-based) network programs such as web servers.

In the following post I will create a javascript server translating a DNA to a protein.

Installing Node.js

I've downloaded the sources for Node.js from http://nodejs.org/#download. It compiled (configure+make) and ran without any problem.

The script

The following script contains a class handling a GeneticCode and the server TranslateDna translating the DNA to a protein, it handles both the POST and the GET http methods. It no parameter is found it displays a simple HTML form, else the form data are decoded and the DNA is translated. The protein is returned as a JSON structure.

Running the server

> node-v0.2.5/node translate.js
Server running at http://127.0.0.1:8080

Test


> curl "http://localhost:8080/"
<html><body><form action="/" method="GET"><h1>DNA</h1><textarea name="dna"></textarea><br/><input type="submit" value="Submit"></form></body></html>

> curl "http://localhost:8080/?dna=ATGAACTATCGATGCTACGACTGATCG"
{"protein":"MNYRCYD*S","query":"ATGAACTATCGATGCTACGACTGATCG"}



That's it,

Pierre

14 December 2010

Looking for an expert ?

Yesterday, Andrew Su asked on Biostar: "Given a gene, what is the best automated method to identify the world experts? ".

Here is my solution:

  • First for a given gene name, we use NCBI-ESearch to find its Gene-Id in NCBI Gene
  • The Gene record is then downloaded as XML using NCBI-EFetch
  • XPATH is used to retrieve all the articles in pubmed about this gene and identified by the XML tags <PubMedId>
  • Each article is downloaded from pubmed. The element <Affiliation> is extracted from the record; sometimes this tag contains the the main contact's email. The authors are also extracted and we count the number of times each author was found. I tried to solve the problem of ambiguity for the names of the authors by looking at the name, surname and initials. If an author's name was contained in the e-mail, it was affected to him
  • At the end, all the authors are sorted in function of the number of times they were seen and the most prolific author is printed out.


Source code


Compilation

javac BioStar4296.java

Test

java BioStar4296 ZC3H7B eif4G1 PRNP

<?xml version="1.0" encoding="UTF-8"?>
<experts>
<gene name="ZC3H7B" geneId="23264" count-pmids="13">
<Person>
<firstName>Sumio</firstName>
<lastName>Sugano</lastName>
<pmid>8125298</pmid>
<pmid>9373149</pmid>
<pmid>14702039</pmid>
<affilitation>International and Interdisciplinary Studies, The University of Tokyo, Japan.</affilitation>
<affilitation>Institute of Medical Science, University of Tokyo, Japan.</affilitation>
<affilitation>Helix Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan.</affilitation>
</Person>
</gene>
<gene name="eif4G1" geneId="1981" count-pmids="106">
<Person>
<firstName>Nahum</firstName>
<lastName>Sonenberg</lastName>
<pmid>7651417</pmid>
<pmid>7935836</pmid>
<pmid>8449919</pmid>
(...)
<affilitation>Department of Biochemistry and McGill Cancer Center, McGill University, Montreal, H3G 1Y6, Quebec, Canada.</affilitation>
<affilitation>Department of Biochemistry, McGill University, Montreal, Quebec, Canada.</affilitation>
<affilitation>Laboratories of Molecular Biophysics, The Rockefeller University, New York, New York 10021, USA.</affilitation>
(...)
</Person>
</gene>
<gene name="PRNP" geneId="5621" count-pmids="429">
<Person>
<firstName>John</firstName>
<lastName>Collinge</lastName>
<pmid>1352724</pmid>
<pmid>1677164</pmid>
<pmid>2159587</pmid>
<pmid>20583301</pmid>
(...)
<mail>j.collinge@ic.ac.uk</mail>
<affilitation>Krebs Institute for Biomolecular Research, Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield S10 2TN, UK.</affilitation>
<affilitation>MRC Prion Unit and Department of Neurogenetics, Imperial College School of Medicine at St. Mary's, London, United Kingdom. J.Collinge@ic.ac.uk</affilitation>
<affilitation>Division of Neuroscience (Neurophysiology), Medical School, University of Birmingham, Edgbaston, Birmingham, UK. sratte@pitt.edu</affilitation>
(...)
</Person>
</gene>
</experts>

about this result


  • ZC3H7B the result is wrong. In Dr Sugano's article (3 articles) ZC3H7B was present in among a large set of other genes used in his studies. The expert would be Dr D. Poncet, my former thesis advisor but he 'only' wrote two articles about this protein.
  • Eif4G1: I know Dr Sonenberg is the expert. His email wasn't found.
  • PRNP Collinge seems to be the expert. Dr Collinge's e-mail was detected.


That's it,

Pierre

13 December 2010

A new journal: BMC Open Research Computation #OpenResComp


Citing ''Aims & scope'':Open Research Computation publishes peer reviewed articles that describe the development, capacities, and uses of software designed for use by researchers in any field.

Submissions relating to software for use in any area of research are welcome as are articles dealing with algorithms, useful code snippets, as well as large applications or web services, and libraries.

Open Research Computation differs from other journals with a software focus in its requirement for the software source code to be made available under an Open Source Initiative compliant license, and in its assessment of the quality of documentation and testing of the software.

In addition to articles describing software Open Research Computation also welcomes submissions that review or describe developments relating to software based tools for research. These include, but are not limited to, reviews or proposals for standards, discussion of best practice in research software development, educational and support resources and tools for researchers that develop or use software based tools.


See also the insights from Cameron Neylon, Jan Aerts, Neil 10K Saunders ...