YOKOFAKUN: erlang

Showing posts with label erlang. Show all posts

17 December 2009

Playing with Erlang (3): concurrency

In this post I'll show how to write a multithreaded erlang program where each child process will calculate if a given number is a Prime Number (not rocket science, a brute force algorithm will be used).

First this erlang module is called 'prime':

-module(prime).

The method named 'test' with no argument is public

-export([test/0]).

The method named 'start' creates a new thread for the function 'loop'. It returns the ID of this new thread.

start()-> spawn(fun loop/0).

The 'loop' method waits for a message in a child process. When it receives a message matching the pattern '{ _From, {prime,Number},Start }' it invokes the recursive function is_prime and prints the result:

loop()->
receive
{ _From, {prime,Number},Start }->
io:format("result=~p\n Process-ID=~p\n Duration=~p ~n",[
is_prime(2,Number),
self(),
timer:now_diff(now(),Start)/1.0e6
])
;
{_From,_Other} ->
io:format("Booomm~n",[])
end.

The function 'is_prime' is a simple-and-stupid recursive method testing if a number is a prime where we search if the numbers 'Div' lower than 'N' have a modulus of 'Number/Div=0':

is_prime(Div,Number)->
if
Div =:= Number -> {result,{number,Number},{prime,true}};
Number rem Div =:= 0 -> {result,{number,Number},{prime,false}};
true-> is_prime(Div+1,Number)
end.

The method named 'rpc' sends the ID of the current process, the prime number and the current system time to the child process:

rpc(Pid,Request)->
Pid ! { self(), Request , now()}
.

The 'test' method creates a list 'NUMS' of numbers. For each number N in this list the function prime:start is called and it returns a new thread-ID. This Thread-ID is sent with 'N' to the function rpc.

test()->
NUMS=[111, 197951153,197951154,102950143,65537,3,7,8,12],
lists:foreach( fun(N)-> Pid= start(), rpc(Pid,{prime,N}) end, NUMS )
.

All in one, here is the full code:

-module(prime).
-export([test/0]).

start()-> spawn(fun loop/0).

rpc(Pid,Request)->
Pid ! { self(), Request , now()}
.

loop()->
receive
{ _From, {prime,Number},Start }->
io:format("result=~p\n Process-ID=~p\n Duration=~p ~n",[
is_prime(2,Number),
self(),
timer:now_diff(now(),Start)/1.0e6
])
;
{_From,_Other} ->
io:format("Booomm~n",[])
end.

is_prime(Div,Number)->
if
Div =:= Number -> {result,{number,Number},{prime,true}};
Number rem Div =:= 0 -> {result,{number,Number},{prime,false}};
true-> is_prime(Div+1,Number)
end.

test()->
NUMS=[111, 197951153,197951154,102950143,65537,3,7,8,12],
lists:foreach( fun(N)-> Pid= start(), rpc(Pid,{prime,N}) end, NUMS )
.

Compiling:

erlc prime.erl

Running:

erl -noshell -s prime test

Result (as you can see, the biggest prime appears at the end of the list because it took more time for the calculation):

result={result,{number,65537},{prime,true}}
Process-ID=<0.34.0>
Duration=0.006522
result={result,{number,111},{prime,false}}
Process-ID=<0.30.0>
Duration=7.6e-5
result={result,{number,197951154},{prime,false}}
Process-ID=<0.32.0>
Duration=4.57e-4
result={result,{number,3},{prime,true}}
Process-ID=<0.35.0>
Duration=7.25e-4
result={result,{number,7},{prime,true}}
Process-ID=<0.36.0>
Duration=7.23e-4
result={result,{number,8},{prime,false}}
Process-ID=<0.37.0>
Duration=7.22e-4
result={result,{number,12},{prime,false}}
Process-ID=<0.38.0>
Duration=7.2e-4
result={result,{number,102950143},{prime,true}}
Process-ID=<0.33.0>
Duration=8.856508
result={result,{number,197951153},{prime,true}}
Process-ID=<0.31.0>
Duration=49.51542

That's it.
Pierre.

30 November 2009

Playing with Erlang (II)

transcripting a DNA to a RNA sequence with an anonymous function

lists:map(fun($T)->$U;(OTHER)->OTHER end,"AATAGCTGATCGACAATGTTAGCTAGGC").
>"AAUAGCUGAUCGACAAUGUUAGCUAGGC"

Testing if a symbol is an acid nucleic

> ACIDNUCLEICS="ATGCatgc".
"ATGCatgc"
> lists:member($A,ACIDNUCLEICS).
true
> lists:member($P,ACIDNUCLEICS).
false

Filtering a sequence with lists:filter

> IsAcidNucleic=fun(BASE)->lists:member(BASE,ACIDNUCLEICS) end.
#Fun<erl_eval.6.13229925>
> lists:filter( IsAcidNucleic, "1 ACGT ACGT ACGT ACGT ACGT ACGT 36").
"ACGTACGTACGTACGTACGTACGT"

Generating an array of numbers

> lists:seq(1,10).
[1,2,3,4,5,6,7,8,9,10]
> lists:seq(65,65+24).
"ABCDEFGHIJKLMNOPQRSTUVWXY"

The following notation says: add 32 to each item 'X' of the sequence 65 to 89.

> [X+32 || X<-lists:seq(65,65+24)].
"abcdefghijklmnopqrstuvwxy"

Create a list of structures {number,'X'} for each item 'X' of the sequence 1 to 5.

> [{number,X}|| X<-lists:seq(1,5)].
[{number,1},{number,2},{number,3},{number,4},{number,5}]

Create a list of structures {number,'X'} for each item 'X' of the sequence 1 to 100 where the modulus of X/15 is '0'.

> [{number,X}|| X<-lists:seq(1,100), X rem 15 =:= 0].
[{number,15},
{number,30},
{number,45},
{number,60},
{number,75},
{number,90}]

Create a list of all the pairs(x,y) for x and y between 1 and 4:

> [{pair,X,Y}|| X<-lists:seq(1,4),Y<-lists:seq(1,4)].
[{pair,1,1},
{pair,1,2},
{pair,1,3},
{pair,1,4},
{pair,2,1},
{pair,2,2},
{pair,2,3},
{pair,2,4},
{pair,3,1},
{pair,3,2},
{pair,3,3},
{pair,3,4},
{pair,4,1},
{pair,4,2},
{pair,4,3},
{pair,4,4}]

Create a list of all the pairs(x,y) for x and y between 1 and 4 having X==Y:

> [{pair,X,Y}|| X<-lists:seq(1,4),Y<-lists:seq(1,4),X=:=Y].
[{pair,1,1},{pair,2,2},{pair,3,3},{pair,4,4}]

A few is_XXXX functions

> is_integer("AAA").
false
> is_integer($A).
true
> is_list(1).
false
> is_list("AZAZAZ").
true
> is_atom(1).
false
> is_atom(hello).
true
> is_tuple(1).
false
> is_tuple({person,{name,"Pierre"}}).
true

A few functions:

% get the header of a list
> hd([100,99,98,97]).
100
% get the 3rd element of a list
> lists:nth(3,["EcoRI","HindIII","BamHI","PstI"]).
"BamHI"
>size({test,{name,"Hello"},{age,3}}).
3
> element(2,{test,{name,"Hello"},{age,3}}).
{name,"Hello"}
> abs(-99).
99

Translating a DNA to a protein.
The file bio.erl:

-module(bio).
-export([translate/1]).

%The standard genetic code table
geneticCode()->"FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG".

%convert a base to an index in the table
base2index($T)-> 0;
base2index($C)-> 1;
base2index($A)-> 2;
base2index($G)-> 3.

%convert a codon to an index, lookup the table and return the amino acid
codon2AminoAcid(A1,A2,A3) -> lists:nth(
(base2index(A1)*16 + base2index(A2)*4 + base2index(A3))+1,
geneticCode())
.

%return an empty array if the argument is an empty sequence or a sequence with less than 3 bases
translate([])->[];
translate([_])->[];
translate([_,_])->[];
%translate the 3 first bases and append the remaining translation.
translate([A1,A2,A3|REMAIN])->[codon2AminoAcid(A1,A2,A3)|translate(REMAIN)].

Invoking bio:translate:

c(bio).

bio:translate("ATGGAGAGGCAGAAACGGAAGGCGGACATCGAGAAG").
"MERQKRKADIEK"

Records

A record file defines the structure of an object. Example: the following file snp.hrl defines the 'class' snp with its default values:

-record(snp,{
name="rs",
chrom="?",
chromStart=0,
chromEnd=0,
avHet= -1
}).

We can now use this class in an erlang shell:

%load the structure of a SNP
rr("snp.hrl").
[snp]
%create an empty instance of a snp
EMPTY_SNP= #snp{}.
#snp{name = "rs",chrom = "?",chromStart = 0,chromEnd = 0,
avHet = -1}
%create and fill an new instance of a snp
RS84=#snp{name="rs84",chrom="chr7",chromStart=25669317,chromEnd=25669318}.
#snp{name = "rs84",chrom = "chr7",chromStart = 25669317,
chromEnd = 25669318,avHet = -1}
%re-use the content of RS84 and fill the value of 'avHet'
RS84_2=RS84#snp{avHet=0.475045}.
#snp{name = "rs84",chrom = "chr7",chromStart = 25669317,
chromEnd = 25669318,avHet = 0.475045}
%extract the name and the avHet for RS84_2
#snp{name=NAME_RS84,avHet=AVHET_RS84}=RS84_2.
NAME_RS84.
"rs84"
AVHET_RS84.
0.475045

That's it
Pierre

26 November 2009

Playing with Erlang (I)

I'm currently reading Joe Armstrong's "Programming Erlang". Here are a couple of notes about ERLANG.

Starting and stopping the Erlang shell

:~> erl
Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.2 (abort with ^G)
1> halt().
:~>

Simple Math

Input:

2*(3+4).
PI=3.14159.
R=2.
SURFACE=PI*R*R.
R=3.

Output:

1> 14
2> 3.14159
3> 2
4> 12.56636
##Variables in erlang are immutable R3 cannot be changed
5> ** exception error: no match of right hand side value 3

Two structure defining a SNP and a Gene are created with a Tuple:

RS94={snp,
{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}
}.
RS47={snp,
{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}
}.
NP_056019={gene,
{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}
}.

Values are extracted using '_' as a placeholder for the unwanted variables.

{_,{_,_},{_,_},{_,_},{_,NAME_OF_RS94},{_,_},{_,_},{_,_}}=RS94.
NAME_OF_RS94.
"rs94"

Create a list of SNP:

LIST_OF_SNP1=[RS94,RS47].

Add P_056019 to this list and create a list of genomic objects:

LIST2=[NP_056019|LIST_OF_SNP1].

Extract the first and second element of LIST2, put the remaining list in LIST3:

[ITEM1,ITEM2|LIST3]=LIST2.
ITEM1.
{gene,{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}}
ITEM2.
{snp,{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}}
LIST3.
[{snp,{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}}]

A String is just an Array of integer:

ALPHABET=[65,66,67,68,69,70].
"ABCDEF"

The dollar( $ ) notation is used to get the 'int' value of a 'char'.

HELLO=[$H,$e,$l,$l,$o].
"Hello"

Methods & Functions

A file named "bio.erl" is created. This file is a erlang module that contains a kind of polymorphic function distance returning the length of a given object (its number of arguments is '/1'). If this object is identified as a atom=snp the value "chromEnd-chromStart" is returned. Else, if atom=gene the value "txEnd-txStart" is returned.

-module(bio).
-export([distance/1]).
distance({snp,
{chrom,_},
{chromStart,START},
{chromEnd,END},
{name,_},
{strand,_},
{func,_},
{avHet,_}}
)-> END - START;
distance({gene,
{chrom,_},
{txStart,START},
{txEnd,END},
{cdsStart,_},
{csdEnd,_},
{name,_},
{strand,_},
{exonCount,_}
})-> END - START.

We now use this module:

c(bio).
RS94={snp,
{chrom,chr6},
{chromStart,98339675},
{chromEnd,98339676},
{name,"rs94"},
{strand,plus},
{func,unknown},
{avHet,0}
}.

RS47={snp,
{chrom,chr7},
{chromStart,11547645},
{chromEnd,11547646},
{name,"rs47"},
{strand,minus},
{func,missence},
{avHet,0.246}
}.

NP_056019={gene,
{chrom,chr7},
{txStart,11380695},
{txEnd,11838349},
{cdsStart,11381945},
{csdEnd,11838097},
{name,"NP_056019"},
{strand,minus},
{exonCount,27}
}.

bio:distance(RS94).
1
bio:distance(RS47).
1
bio:distance(NP_056019).
457654

We now want to calculate the GC percent of a DNA: the bio.erl file is modified as follow:

-module(bio).
-export([gcPercent/1]).
-export([distance/1]).
(...)

gc($A) -> 0.0;
gc($T) -> 0.0;
gc($C) -> 1.0;
gc($G) -> 1.0;
gc([])->0;
gc([BASE|REMAIN])->gc(BASE)+gc(REMAIN).

gcPercent(ADN)->100.0*(gc(ADN)/erlang:length(ADN)).

Here the method gc returns '1' or '0' if the argument is a base; returns 0 if the array is empty, or return the sum of the gc(first character of the string) plus the gc(remaining string). The method gcPercent divide the sum of gc by the length of the string and multiply it by 100.

c(bio).
bio:gcPercent("GCATG").
60.0

That's it.
Pierre

YOKOFAKUN

17 December 2009

Playing with Erlang (3): concurrency

30 November 2009

Playing with Erlang (II)

Records

26 November 2009

Playing with Erlang (I)

Starting and stopping the Erlang shell

Simple Math

Methods & Functions

About Me

Feeds

Blog Archive

Web2.0

Labels