Starting and stopping the Erlang shell
:~> erl
Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.7.2 (abort with ^G)
1> halt().
:~>
Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.7.2 (abort with ^G)
1> halt().
:~>
Simple Math
Input:2*(3+4).
PI=3.14159.
R=2.
SURFACE=PI*R*R.
R=3.
Output:PI=3.14159.
R=2.
SURFACE=PI*R*R.
R=3.
1> 14
2> 3.14159
3> 2
4> 12.56636
##Variables in erlang are immutable R3 cannot be changed
5> ** exception error: no match of right hand side value 3
2> 3.14159
3> 2
4> 12.56636
##Variables in erlang are immutable R3 cannot be changed
5> ** exception error: no match of right hand side value 3
Two structure defining a SNP and a Gene are created with a Tuple:
RS94={snp,
{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}
}.
RS47={snp,
{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}
}.
NP_056019={gene,
{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}
}.
{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}
}.
RS47={snp,
{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}
}.
NP_056019={gene,
{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}
}.
Values are extracted using '_' as a placeholder for the unwanted variables.
{_,{_,_},{_,_},{_,_},{_,NAME_OF_RS94},{_,_},{_,_},{_,_}}=RS94.
NAME_OF_RS94.
"rs94"
NAME_OF_RS94.
"rs94"
Create a list of SNP:
LIST_OF_SNP1=[RS94,RS47].
Add P_056019 to this list and create a list of genomic objects:LIST2=[NP_056019|LIST_OF_SNP1].
Extract the first and second element of LIST2, put the remaining list in LIST3:[ITEM1,ITEM2|LIST3]=LIST2.
ITEM1.
{gene,{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}}
ITEM2.
{snp,{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}}
LIST3.
[{snp,{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}}]
ALPHABET=[65,66,67,68,69,70].
"ABCDEF"
"ABCDEF"
The dollar( $ ) notation is used to get the 'int' value of a 'char'.
HELLO=[$H,$e,$l,$l,$o].
"Hello"
Methods & Functions
A file named "bio.erl" is created. This file is a erlang module that contains a kind of polymorphic function distance returning the length of a given object (its number of arguments is '/1'). If this object is identified as a atom=snp the value "
chromEnd-chromStart
" is returned. Else, if atom=gene the value "txEnd-txStart
" is returned.-module(bio).
-export([distance/1]).
distance({snp,
{chrom,_},
{chromStart,START},
{chromEnd,END},
{name,_},
{strand,_},
{func,_},
{avHet,_}}
)-> END - START;
distance({gene,
{chrom,_},
{txStart,START},
{txEnd,END},
{cdsStart,_},
{csdEnd,_},
{name,_},
{strand,_},
{exonCount,_}
})-> END - START.
-export([distance/1]).
distance({snp,
{chrom,_},
{chromStart,START},
{chromEnd,END},
{name,_},
{strand,_},
{func,_},
{avHet,_}}
)-> END - START;
distance({gene,
{chrom,_},
{txStart,START},
{txEnd,END},
{cdsStart,_},
{csdEnd,_},
{name,_},
{strand,_},
{exonCount,_}
})-> END - START.
We now use this module:
c(bio).
RS94={snp,
{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}
}.
RS47={snp,
{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}
}.
NP_056019={gene,
{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}
}.
bio:distance(RS94).
1
bio:distance(RS47).
1
bio:distance(NP_056019).
457654
RS94={snp,
{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}
}.
RS47={snp,
{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}
}.
NP_056019={gene,
{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}
}.
bio:distance(RS94).
1
bio:distance(RS47).
1
bio:distance(NP_056019).
457654
We now want to calculate the GC percent of a DNA: the bio.erl file is modified as follow:
-module(bio).
-export([gcPercent/1]).
-export([distance/1]).
(...)
gc($A) -> 0.0;
gc($T) -> 0.0;
gc($C) -> 1.0;
gc($G) -> 1.0;
gc([])->0;
gc([BASE|REMAIN])->gc(BASE)+gc(REMAIN).
gcPercent(ADN)->100.0*(gc(ADN)/erlang:length(ADN)).
Here the method gc returns '1' or '0' if the argument is a base; returns 0 if the array is empty, or return the sum of the gc(first character of the string) plus the gc(remaining string). The method gcPercent divide the sum of gc by the length of the string and multiply it by 100.-export([gcPercent/1]).
-export([distance/1]).
(...)
gc($A) -> 0.0;
gc($T) -> 0.0;
gc($C) -> 1.0;
gc($G) -> 1.0;
gc([])->0;
gc([BASE|REMAIN])->gc(BASE)+gc(REMAIN).
gcPercent(ADN)->100.0*(gc(ADN)/erlang:length(ADN)).
c(bio).
bio:gcPercent("GCATG").
60.0
bio:gcPercent("GCATG").
60.0
That's it.
Pierre
Pierre,
ReplyDeletethanks for that. Started tinkering with Erlang a while ago but went back to Python as I was simply more productive in it, not the least due the extensive library support.
Did you spot any scientific libraries yet? Graph support, sequence parsers, anything?
-- Oliver
Hi Setar,
ReplyDeleteNon that was my very first test with Erlang. I don't know anything else about this language.