26 November 2009

Playing with Erlang (I)

I'm currently reading Joe Armstrong's "Programming Erlang". Here are a couple of notes about ERLANG.

Starting and stopping the Erlang shell


:~> erl
Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.2 (abort with ^G)
1> halt().
:~>

Simple Math

Input:
2*(3+4).
PI=3.14159.
R=2.
SURFACE=PI*R*R.
R=3.
Output:
1> 14
2> 3.14159
3> 2
4> 12.56636
##Variables in erlang are immutable R3 cannot be changed
5> ** exception error: no match of right hand side value 3

Two structure defining a SNP and a Gene are created with a Tuple:
RS94={snp,
{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}
}.
RS47={snp,
{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}
}.
NP_056019={gene,
{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}
}.

Values are extracted using '_' as a placeholder for the unwanted variables.
{_,{_,_},{_,_},{_,_},{_,NAME_OF_RS94},{_,_},{_,_},{_,_}}=RS94.
NAME_OF_RS94.
"rs94"

Create a list of SNP:
LIST_OF_SNP1=[RS94,RS47].
Add P_056019 to this list and create a list of genomic objects:
LIST2=[NP_056019|LIST_OF_SNP1].
Extract the first and second element of LIST2, put the remaining list in LIST3:

[ITEM1,ITEM2|LIST3]=LIST2.
ITEM1.
{gene,{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}}

ITEM2.
{snp,{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}}

LIST3.
[{snp,{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}}]

A String is just an Array of integer:
ALPHABET=[65,66,67,68,69,70].
"ABCDEF"

The dollar( $ ) notation is used to get the 'int' value of a 'char'.

HELLO=[$H,$e,$l,$l,$o].
"Hello"

Methods & Functions


A file named "bio.erl" is created. This file is a erlang module that contains a kind of polymorphic function distance returning the length of a given object (its number of arguments is '/1'). If this object is identified as a atom=snp the value "chromEnd-chromStart" is returned. Else, if atom=gene the value "txEnd-txStart" is returned.
-module(bio).
-export([distance/1]).
distance({snp,
{chrom,_},
{chromStart,START},
{chromEnd,END},
{name,_},
{strand,_},
{func,_},
{avHet,_}}
)-> END - START;
distance({gene,
{chrom,_},
{txStart,START},
{txEnd,END},
{cdsStart,_},
{csdEnd,_},
{name,_},
{strand,_},
{exonCount,_}
})-> END - START.

We now use this module:
c(bio).
RS94={snp,
{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}
}.


RS47={snp,
{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}
}.

NP_056019={gene,
{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}
}.

bio:distance(RS94).
1
bio:distance(RS47).
1
bio:distance(NP_056019).
457654

We now want to calculate the GC percent of a DNA: the bio.erl file is modified as follow:
-module(bio).
-export([gcPercent/1]).
-export([distance/1]).
(...)

gc($A) -> 0.0;
gc($T) -> 0.0;
gc($C) -> 1.0;
gc($G) -> 1.0;
gc([])->0;
gc([BASE|REMAIN])->gc(BASE)+gc(REMAIN).

gcPercent(ADN)->100.0*(gc(ADN)/erlang:length(ADN)).

Here the method gc returns '1' or '0' if the argument is a base; returns 0 if the array is empty, or return the sum of the gc(first character of the string) plus the gc(remaining string). The method gcPercent divide the sum of gc by the length of the string and multiply it by 100.
c(bio).
bio:gcPercent("GCATG").
60.0



That's it.
Pierre

2 comments:

Setar said...

Pierre,

thanks for that. Started tinkering with Erlang a while ago but went back to Python as I was simply more productive in it, not the least due the extensive library support.

Did you spot any scientific libraries yet? Graph support, sequence parsers, anything?

-- Oliver

Pierre Lindenbaum said...

Hi Setar,
Non that was my very first test with Erlang. I don't know anything else about this language.