26 March 2010

'R' = dna.translate("AGG") . A custom C function for R, My notebook.


In the following post, I will show how I've implemented a custom C function for R. This C function will translate a DNA to a protein. I'm very new to 'R' so feel free to make any comment about the code.

C code


The data in 'R' are stored in an opaque structure named 'SEXP'. A custom C function receives some SEXP arguments and returns another SEXP. So, the declaration of my function translate_dna is:
SEXP translate_dna(SEXP sequences)
{
(...)
}

We check if the argument 'sequences' has a type='character'.
if(!isString(sequences))
error("argument is not a string[]");
The input consists of one or more DNA, so our return value will be an SEXP array of type 'character' (STRSXP) with a size equals to the number of DNAs. This array is allocated:
SEXP array_of_peptides = allocVector(STRSXP,length(sequences));
We loop over each DNA sequence
for(in i=0;i<length(sequences);++i) ...
and we allocate some memory for the peptide:
const char *dna=CHAR(STRING_ELT(sequences, i));
int dna_length=strlen(dna);
char* peptide =malloc(dna_length/3+1);
The peptide is then filled with its amino acids:
for(j=0;j+2 < dna_length;j+=3)
{
peptide[n++]=_translate(dna[j],dna[j+1],dna[j+2]);
}
peptide[n]=0;
And we put this new string/peptide in the returned value
SET_STRING_ELT(array_of_peptides,i,Rf_mkChar(peptide));
Here, I am not sure how 'R' handles its memory. Should I care about the way R runs its garbage manager ? how should I use the macros PROTECT and UNPROTECT ?

Compilation


The code 'translate.c' is compiled as a dynamic library ' libtranslate.so':

gcc -fPIC -I -g -c -Wall -I ${R_HOME}/include translate.c
gcc -shared -Wl,-soname,libtranslate.so.1 -o libtranslate.so translate.o

R code

on the R side , the previous dynamic library is loaded:
dyn.load(paste("libtranslate", .Platform$dynlib.ext, sep=""))
A function dna.translate is declared: it forces the array to be an array of string and it invokes the C function 'translate_dna'
storage.mode(dna) <- "character"
.Call("translate_dna", dna)
We can now invoke the R function dna.translate:
peptides <- dna.translate(
c( "ATGGAGAGGCAGAAACGGAAGGCGGACATCGAGAAAGGGCTGCAGTTCATTCAGTCGACACTAC",
NULL,
"CCCAAAAGCAAGAAGAATATGAGGCCTTTCTGCT",
"CAAACTGGTGCAGAATCTGTTTGCTGAGGGCAATGA",
NULL,
"GGCAGATCAGGGAACTTCTAATGGATTGGGGTCC",
"GGATAACTGCACCTTCGCCTACCATCAGGAGGA",
"GGGTCCCAGGCAGCGCTGCCTGGGGGCTGGGGAG",
"TTGCGACAGGCTCCAGAAGGGCAAAGCCTGCCCAGAT",
"GCACCCCCCTCCTCTCCACCCTACCTTCCATCAACCA",
"AGG"
))

print(peptides)

Result:
[1] "MERQKRKADIEKGLQFIQSTL" "PKSKKNMRPFC" "QTGAESVC*GQ*"
[4] "GRSGNF*WIGV" "G*LHLRLPSGG" "GSQAALPGGWG"
[7] "LRQAPEGQSLPR" "APPSSPPYLPST" "R"


Full source code

:
C code:
#include <stdio.h>
#include <ctype.h>
#include <R.h>
#include <Rinternals.h>
/* the genetic code */
static const char* STANDARD_GENETIC_CODE="FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG";

static char _translate(char a,char b,char c);


/** translates DNA to protein
* @arg sequences one or more DNA sequence
* @return an array of peptides
*/
SEXP translate_dna(SEXP sequences)
{
int i;
SEXP array_of_peptides;
//check input has type= characters
if(!isString(sequences))
error("argument is not a string[]");
//prepare a new list for the translated sequence
array_of_peptides = allocVector(STRSXP,length(sequences));

//loop over the input
for(i=0;i< length(sequences);++i)
{
int j=0,n=0;
//transform the sequence into a ptr* of char
const char *dna=CHAR(STRING_ELT(sequences, i));
//get the length of this sequence
int dna_length=strlen(dna);
//alloc the protein sequence
char* peptide =malloc(dna_length/3+1);
if(peptide==NULL) error("out of memory");
//loop over the codons
for(j=0;j+2 < dna_length;j+=3)
{
//set the amino acid at 'n'
peptide[n++]=_translate(dna[j],dna[j+1],dna[j+2]);
}
//add EOS
peptide[n]=0;
//put a copy of this peptide in the vector
SET_STRING_ELT(array_of_peptides,i,Rf_mkChar(peptide));
//free our copy
free(peptide);
}
//return the array of petptide
return array_of_peptides;
}


static int base2index(char c)
{
switch(tolower(c))
{
case 't': return 0;
case 'c': return 1;
case 'a': return 2;
case 'g': return 3;
default: return -1;
}
}

static char _translate(char a,char b,char c)
{
int base1= base2index(a);
int base2= base2index(b);
int base3= base2index(c);
if(base1==-1 || base2==-1 || base3==-1)
{
return '?';
}
else
{
return STANDARD_GENETIC_CODE[base1*16+base2*4+base3];
}

}

Makefile:
run:translate.so
${R_HOME}/bin/R --no-save < translate.R

translate.so:translate.c
gcc -fPIC -I -g -c -Wall -I ${R_HOME}/include translate.c
gcc -shared -Wl,-soname,libtranslate.so.1 -o libtranslate.so translate.o

R code:
#load the dynamic library translate
dyn.load(paste("libtranslate", .Platform$dynlib.ext, sep=""))
#declare the function dna.translate
dna.translate <- function(dna)
{
#force dna to be type=character
storage.mode(dna) <- "character"
#call the C function
.Call("translate_dna", dna)
}

peptides <- dna.translate(
c( "ATGGAGAGGCAGAAACGGAAGGCGGACATCGAGAAAGGGCTGCAGTTCATTCAGTCGACACTAC",
NULL,
"CCCAAAAGCAAGAAGAATATGAGGCCTTTCTGCT",
"CAAACTGGTGCAGAATCTGTTTGCTGAGGGCAATGA",
NULL,
"GGCAGATCAGGGAACTTCTAATGGATTGGGGTCC",
"GGATAACTGCACCTTCGCCTACCATCAGGAGGA",
"GGGTCCCAGGCAGCGCTGCCTGGGGGCTGGGGAG",
"TTGCGACAGGCTCCAGAAGGGCAAAGCCTGCCCAGAT",
"GCACCCCCCTCCTCTCCACCCTACCTTCCATCAACCA",
"AGG"
))

print(peptides)


That's it

Pierre

25 March 2010

Playing with Biomart. My notebook.


Thanks to Neil, I've discovered that Biomart was not just a web interace, it also contains a powerful API: Citing http://www.biomart.org/martservice.html:
To submit a query using our webservices generate an XML document conforming to our Query XML syntax. This can be achieved simply by building up your query using MartView and hitting the XML button . This XML should be posted to http://(...)/martservice attached to a single parameter of query.

The base URL for Biomart is http://www.biomart.org/biomart.

Retrieving the Registry information for this Biomart installation: http://www.biomart.org/biomart/martservice?type=registry
<MartRegistry>
<MartURLLocation database="ensembl_mart_57" default="1" displayName="ENSEMBL GENES 57 (SANGER UK)" host="www.biomart.org" includeDatasets="" martUser="" name="ensembl" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
<MartURLLocation database="hapmart_27_36" default="1" displayName="HAPMAP 27 (NCBI US)" host="hapmap.ncbi.nlm.nih.gov" includeDatasets="" name="HapMap_rel27" path="/biomart/martservice" port="80" redirect="1" serverVirtualSchema="rel27_NCBI_Build36" visible="1" />
<MartURLLocation database="vega_mart_57" default="0" displayName="VEGA 37 (SANGER UK)" host="www.biomart.org" includeDatasets="" martUser="" name="vega" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
<MartURLLocation database="genomic_features_mart_57" default="0" displayName="ENSEMBL GENOMIC FEATURES 57 (SANGER UK)" host="www.biomart.org" includeDatasets="" martUser="" name="genomic_features" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="0" />
(...)
</MartRegistry>


Retrieving the datasets available for the registry HapMap_rel27: http://www.biomart.org/biomart/martservice?type=datasets&mart=HapMap_rel27 :
TableSet hm27_variation_yri HapMap Population YRI - Yoruba in Ibadan, Nigeria (West Africa) 1 20050000 default 2009-10-28 11:19:52
TableSet hm27_variation_tsi HapMap Population TSI - Toscans in Italy 1 200 50000 default 2009-10-28 11:13:07
TableSet hm27_gene hm27_gene 0 200 50000 default 2009-09-17 12:04:20
TableSet hm27_variation_chd HapMap Population CHD - Chinese in Metropolitan Denver, Colorado 1 200 50000 default 2009-10-28 11:07:01
TableSet hm27_variation All Populations 1 200 50000 default 2009-10-28 11:22:21
TableSet hm27_variation_chb HapMap Population CHB - Han Chinese in Beijing, China 1 200 50000 default 2009-10-28 11:05:57
TableSet hm27_variation_asw HapMap Population ASW - African ancestry in Southwest USA 1 20050000 default 2009-10-28 08:53:43
TableSet hm27_variation_ceu HapMap Population CEU - Utah residents with Northern and Western European ancestry from the CEPH collection 1 200 50000 default 2009-10-28 11:04:39
TableSet hm27_variation_mkk HapMap Population MKK - Maasai in Kinyawa, Kenya 1 200 50000 default 2009-10-28 11:12:12
(...)

Where can I find a description of those columns ?


Retrieving the attributes for the dataset hm27_variation_ceu: http://www.biomart.org/biomart/martservice?type=attributes&dataset=hm27_variation_ceu :
chrom chromosome naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main chrom
start position naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main start
strand strand naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main strand
marker1 marker id naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main marker1
alleles alleles naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main alleles
refallele reference allele naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main refallele
center genotyping center naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__maincenter
platform genotyping platform naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main platform
ceu_id CEU genotype naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__dm genotype
ref_allele reference allele naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main ref_allele
ref_allele_freq reference allele frequency naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main ref_allele_freq
ref_allele_count reference allele count naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main ref_allele_count
other_allele other allele naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__mainother_allele
other_allele_freq other allele frequency naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main other_allele_freq
other_allele_count other allele count naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main other_allele_count
total_allele_count total allele count naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main total_allele_count
refhom_gt reference homozygote genotype naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main refhom_gt
het_gt heterozygote genotype naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__mainhet_gt
otherhom_gt other homozygote genotype naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main otherhom_gt
refhom_gtfreq reference homozygote genotype frequency naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main refhom_gtfreq
het_gtfreq heterozygote genotype frequency naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main het_gtfreq
otherhom_gtfreq other homozygote genotype frequency naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main otherhom_gtfreq
refhom_gtcount reference homozygote genotype count naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main refhom_gtcount
het_gtcount heterozygote genotype count naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main het_gtcount
otherhom_gtcount other homozygote genotype count naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main otherhom_gtcount
total_gtcount total genotype count naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__variation__main total_gtcount
prot_lsid protocol LSID naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__assay__dm prot_lsid
assay_id assay LSID naive_attributes html,txt,csv,tsv,xls hm27_variation_ceu__assay__dm assay_lsid

(Where can I find a description of those columns ?). Each attribute is a specific column in the final result. Example of attributes available for the dataset hm27_variation_ceu:
  • chrom
  • start
  • marker1
  • alleles
  • (...)


Retrieving the filters for the dataset hm27_variation_ceu: http://www.biomart.org/biomart/martservice?type=filters&dataset=hm27_variation_ceu :
allele_freq Minor Allele Frequency [>=] [0.0,0.01,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5] naive_filters list >= hm27_variation_ceu__variation__main avg_dprime
center Genotyping Center [affymetrix,bcm,illumina,imsut-riken,mcgill-gqic,perlegen,ucsf-wu,broad,sanger] naive_filters list = hm27_variation_ceu__variation__main center
platform Genotyping Platform [AFFY_100K,AFFY_500K,BeadArray,FP-TDI,Infinium_H1,Infinium_H300,Invader,MIP,Perlegen,AFFY_6.0,Illumina_1M] naive_filters list = hm27_variation_ceu__variation__main platform
monomorphic Monomorphic SNPs [,0] naive_filters boolean_num =only,excluded hm27_variation_ceu__variation__main is_reference_bool
marker_name List of HapMap SNPs [] naive_filters = hm27_variation_ceu__variation__mainmarker1
utr3 3' UTR [] naive_filters boolean_list only,excluded hm27_variation_ceu__variation__main is_utr_bool
utr5 5' UTR [] naive_filters boolean_list only,excluded hm27_variation_ceu__variation__main is_5utr_bool
coding_synon Exons - synonymous coding SNPs [] naive_filters boolean_list only,excluded hm27_variation_ceu__variation__main is_coding_synon_bool
coding_nonsynon Exons - non-synonymous coding SNPs [] naive_filters boolean_list only,excluded hm27_variation_ceu__variation__main is_coding_nonsynon_bool
intronic introns [] naive_filters boolean_list only,excluded hm27_variation_ceu__variation__mainis_intronic_bool
chrom chromosome [chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY] naive_filters list = hm27_variation_ceu__variation__main chrom
start From position [] naive_filters text >= hm27_variation_ceu__variation__main start
stop To position [] naive_filters text <= hm27_variation_ceu__variation__main stop
gene_list List of Genes [] naive_filters =,in pointer dataset gene_primary_symbol
encode_region ENCODE region [ENm010: 7:26924045..27424045,ENm013: 7:89621624..90736048,ENm014: 7:125865891..127029088,ENr112: 2:51512208..52012208,ENr113: 4:118466103..118966103,ENr123: 12:38626476:39126476,ENr131: 2:234156563..234656627,ENr213: 18:23719231..24219231,ENr232: 9:130725122..131225122,ENr321: 8:118882220..119382220] naive_filters list = pointer dataset glook_encode_region
(Where can I find a description of those columns ?) Here we can find some filters for hm27_variation_ceu (e.g.: allele_freq)

Retrieving the configurations for the dataset hm27_variation_ceu: http://www.biomart.org/biomart/martservice?type=configuration&dataset=hm27_variation_ceu :
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DatasetConfig>
<DatasetConfig dataset="hm27_variation_ceu" datasetID="2" displayName="HapMap Population CEU - Utah residents with Northern and Western European ancestry from the CEPH collection" interfaces="default" internalName="default" martUsers="default" modified="2009-10-28 11:04:39" softwareVersion="0.6" template="hm27_variation_ceu" type="TableSet" visible="1">
<MainTable>hm27_variation_ceu__variation__main</MainTable>
<Key>marker_id_key</Key>
<Importable filters="chrom,start,stop" internalName="encode_filter" linkName="encode_filter" name="encode_filter" type="link"/>
<Importable displayName="gene_filter_35" filters="chrom,start,stop" internalName="gene_filter_35" linkName="gene_filter_35" name="gene_filter_35" type="link"/>
<FilterPage displayName="FILTERS" internalName="naive_filters">
<FilterGroup displayName="FILTERS" internalName="filters">
<FilterCollection displayName="MINOR ALLELE FREQUENCY [&gt;=]" internalName="pop_filter">
<FilterDescription displayName="Minor Allele Frequency [&gt;=]" displayType="list" field="avg_dprime" internalName="allele_freq" key="marker_id_key" legal_qualifiers="&gt;=" qualifier="&gt;=" style="menu" tableConstraint="main" type="list">
<Option displayName="0.0" internalName="0.0" isSelectable="true" value="0.0"/>
<Option displayName="0.01" internalName="0.01" isSelectable="true" value="0.01"/>
<Option displayName="0.05" internalName="0.05" isSelectable="true" value="0.05"/>
<Option displayName="0.1" internalName="0.1" isSelectable="true" value="0.1"/>
<Option displayName="0.15" internalName="0.15" isSelectable="true" value="0.15"/>
<Option displayName="0.2" internalName="0.2" isSelectable="true" value="0.2"/>
<Option displayName="0.25" internalName="0.25" isSelectable="true" value="0.25"/>
<Option displayName="0.3" internalName="0.3" isSelectable="true" value="0.3"/>
<Option displayName="0.35" internalName="0.35" isSelectable="true" value="0.35"/>
(...)
As far as I understand , this file is used to 'group' the filters.

At the end, here is Query sent to Biomart:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName="default" formatter="TSV" header="0" uniqueRows="0" count="" datasetConfigVersion="0.6">
<Dataset name="hm27_variation_ceu" interface="default">
<Filter name="stop" value="1000000"/>
<Filter name="allele_freq" value="0.3"/>
<Filter name="chrom" value="chr1"/>
<Filter name="start" value="0"/>
<Attribute name="chrom"/>
<Attribute name="start"/>
<Attribute name="strand"/>
<Attribute name="marker1"/>
<Attribute name="alleles"/>
<Attribute name="ceu_id"/>
</Dataset>
</Query>
and we can send it with curl:
curl --data 'query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" uniqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="hm27_variation_ceu" interface="default" ><Filter name="stop" value="1000000"/><Filter name="allele_freq" value="0.3"/><Filter name="chrom" value="chr1"/><Filter name="start" value="0"/><Attribute name="chrom" /><Attribute name="start" /><Attribute name="strand" /><Attribute name="marker1" /><Attribute name="alleles" /><Attribute name="ceu_id" /></Dataset></Query>' http://www.biomart.org/biomart/martservice


Response from the Biomart server:
chr1 843817 + rs1806509 A/C AC AA AC AC AC CC AC CC AC CC AC NN AC CC AA NN AC AC NN CC AC CC AC AC AA AC AC AC CC AA AC AC AC AC AA CC AC AC AA AA AA AA NN AA AA AA AC AA NN AA NN AC AC CC AA AA AC AC AC AC AA AA AA AC AA AA CC CC AC AA AC AC AC AA CC AA AC CC AA NN CC CC AA AC AA AC AA AA AC AC AC AC AA AA NN AA AA CC AC AC AA AA AA AC AC AC AA AC AA AC AC AA AA AC AA CC AC AC AC AC AA AA AA CC AC CC AC NN AC AC AA AA CC AA AC AC AC AA AC AA AC AC AC AA AC CC AA AA CC AA AA AC AC AC AC AA CC AA AC CC CC AC AC AA CC AA AC AC AC AA AA CC AA AA
chr1 881808 + rs13303106 A/G AG GG AG GG GG AG AG AG GG GG GG NN AA AG GG NN GG GG NN GG AG AG GG AG GG GG AG GG AA GG AG GG AG AA GG AG GG AG GG GG GG GG NN GG GG GG AA AG NN GG NN AG AG AA GG AG AG AA AG AG GG GG GG AG GG GG AG AA GG GG AG GG AG GG AG GG AG AG GG NN AA GG GG AG GG AG AG AG AG AG GG AG GG GG NN GG GG AG AG AG AG GG AG AG GG GG GG AG GG AG AG GG AG AG AG AA GG AG AG AG GG GG GG AG AG AA AG NN AG AG GG GG AA GG GG GG AA GG GG AG AG AG AG GG AG AA GG GG AA GG GG AG AG AG GG GG AA GG GG AA AA AG AG AG AA GG AA AG AG GG AG AA GG GG
chr1 906697 + rs6694632 A/G AG AA AA AG AA AG AG AG AA AA AG NN GG AA GG NN AA AA NN AA AG AG AA AG AA AA AG AA GG GG GG AG AA GG AA AA AA AG AG AG AG AA NN AA AA AA AG AG NN AA NN AG AG AG AG AG AG GG AG AG AA AA AA AG AG AG AA GG AA AA AA GG GG AA AG GG AA AG AA NN AG AA AG AG AA AG GG GG AA GG AG AG AG AG NN AA AG AG AG AG AG AA AA GG AA AA AA AA AA AG AG AA AG AG AG AA AG AA AG AG AA AA GG AG AG GG AG NN AG AG GG AG AA AG AG AA AG AA AA AA AG AA AG AA AA GG AA AA GG AA AA AG AG AG AA AA GG AA AA GG GG AG AG AG AG AG GG AA AG AG AG AG AG AG
chr1 908247 + rs13303118 G/T GT TT TT GT TT GT GT GT TT TT GT NN GG TT GG NN TT TT NN TT GT GT TT GT TT TT GT TT GG GG GG GT TT GG TT TT TT GT GT GT GT TT NN TT TT TT GT GT NN TT NN NN GT GT GT GT GT GG GT GT TT TT TT GT GT GT TT GG TT TT TT GG GG TT GT GG TT GT TT NN GT TT GT GT TT GT GG GG TT GG GT GT GT GT NN TT GT GT GT GT GT TT TT GG TT TT TT TT TT GT GT TT GT GT GT TT GT TT GT GT TT TT GG GT GT GG GT NN GT GT GG GT TT GT GT TT GT TT TT TT GT TT GT TT TT GG TT TT GG TT TT GT GT GT TT TT GG TT TT GG GG GT GT GT GT GT GG TT GT GT GT GT GT GT
chr1 908436 + rs2341354 A/G AG GG GG AG GG AG AG AG GG GG AG NN AA GG AA NN GG GG NN GG AG AG GG AG GG GG AG GG AA AA AA AG GG AA GG GG GG AG AG AG AG GG NN GG GG GG AG AG NN GG NN AG AG AG AG AG AG AA AG AG GG GG GG AG AG AG AG AA GG GG GG AA AA GG AG AA GG AG GG NN AG GG AG AG GG AG AA AA GG AA AG AG AG AG NN GG AG AG AG AG AG GG GG AA GG GG GG GG GG AG AG GG AG AG AG GG AG GG AG AG GG GG AA AG AG AA AG NN AG AG AA AG GG AG AG GG AG GG GG GG AG GG AG GG GG AA GG GG AA GG GG AG AG AG GG GG AA GG GG AA AA AG AG AG AG AG AA GG AG AG AG AG AG AG
chr1 934427 + rs3128117 C/T TT TT CT CC TT CT CC CT TT TT CT NN CT CT CC NN TT TT NN TT TT CT TT CC TT TT CT TT CC CC TT TT TT CT TT TT CT CC CT CT CT TT NN TT CT TT CT CT NN TT NN CT CC TT CT CT CT CC CT CT TT CT TT CT CT CT TT CT TT TT CT CC CT TT CT CC CT CT TT NN CT TT CT CT CT CT CC CC TT CC CT TT TT CC NN TT TT CT CC CT TT TT TT CT CT TT TT TT TT TT TT TT CT CT CT CT CT CT CC CT CT TT CC CT CT CT CT NN CC CC CC CT CT CT CC TT CT TT TT TT CT TT CC CT TT CC CT TT CC TT TT CT CC CT TT TT CT TT TT CT CC TT CT TT CT CT CT CT TT CT TT TT CT CT
chr1 940106 + rs1891906 A/C AA AA AC CC AA AC AC AC AA AA AC NN AC AC CC NN AA AA NN AA AA AC AA CC AA AA AC AA CC CC AA AA AA AC AA AA AC CC AC AC AC AA NN AA AC AA AC AC NN AA NN AC CC AA AC AC AC CC AC AC AA AC AA AC AC AC AA AC AA AA AC CC AC AA AC AC AC AC AA NN AC AA AC AC AC AC CC CC AA NN AC AA AA CC NN AA AA AC CC AC AA AA AA AC AC AA AA AA AA AA AA AA AC AC AC AC AC AC CC AC AC AA CC AC AC AC AC NN CC CC AC AC AC AC CC AA AC AA AA AA AC AA CC AC AA CC AC AA CC AA AA AC CC AC AA AA AC AA AA AC CC AA AC AA AC AC AC AC AA AC AA AA AC AC
chr1 949705 + rs2710888 C/T CT CC CT CT CC CT CT CT CC CC CT NN CT CT TT NN CC CC NN CC CC CC NN CT CC CC CT CC TT CT CC CC CT CC CC CC CT TT CT CT CT CC NN CC CC CT TT CT NN CC NN CT TT CC CT CC CT TT CT CT CT CC CC CT CC CT CC CT CC CC CT CT CT CT CT CC CT CT CC NN CT CC CT CT CT CT CT TT CT TT CC CC CC TT NN CC CC CT TT CC CC CC CT CT CT CC CC CT CC CC CT CC CT CT CT TT CC CT TT CT CT CC CT CT CT CT CC NN CC TT CT CT CT CT CT CC CT CC CC CC CT CT TT CT TT TT CT CC TT CC CC CT TT CT CC CC CT CC CC CT TT CC CT CC CT CT CT CT CT CC CT CT CC CT
chr1 952073 + rs3128126 A/G NN AA NN NN AA AG AG NN NN AA NN AG AG AG NN AG NN NN AG NN AG AA AG NN NN AA NN AA NN AG AA AA NN NN AA AG NN NN NN AG AG NN AA NN NN AA AG AG AA AA AG AG GG NN NN AA AG GG AA AG AA NN AG AG NN NN NN NN NN NN NN NN NN NN AG AG AG AG AA AG GG AA AG AG NN AG AG GG AG GG AA AA AA GG GG AA AG AG GG NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN GG NN AG GG NN NN AG NN NN AG AG AA NN AA AA AA AG NN NN NN NN NN NN AA GG AA AA AG GG NN NN NN NN NN NN NN NN NN GG AA AG GG AG AG NN AA NN NN AA AG
chr1 952469 + rs4970393 A/G AG AA AG GG AA AG AG AG AA AA AG NN AG AG AG NN AA AA NN AA AG AA AG AG AA AA AG AA GG GG AA AA AA GG AA AA AG GG AG AG AG AA NN AA AG AA AG AG NN AA NN AG GG AG AG AG AG GG AA AG AA AG AA AG AG AG AA GG AA AA AG AG AG AG AG AG AG AG AA NN AG AA AG AG AG AG GG NN AG GG AG AA AA GG NN AA AA AG GG AG AG AA AA AG AG AA AA AG AA AA AG AA AG AG AG AG AG AG GG AG AG AG NN AG AG AG AA NN AG NN GG AG AG AG GG AA AG AA AA AA AG AG GG AG GG GG AG AA GG AA AA AG GG AG AA AA AG AA AA GG AG AG AG AG AG AG NN AG AA AG AA AA AG AG



Note: I've also tried to use the SOAP/WSDL Biomart API, but as it was discussed on Biostar, the API does not work.


That's it

Pierre

24 March 2010

Learning DesignPatterns: The Factory Method Pattern

"The Factory method pattern defines a separate method for creating the objects, which subclasses can then override to specify the derived type of product that will be created"

Example

/** Abstract class for melting temperature calculation */
abstract class TmCalculator
{
public abstract double calcTm(CharSequence s);
}

/** the factory creates two kinds of TmCalculator
1) 2AT4GC
2) Nearest-Neighbor
*/
public class TmCalculatorFactory
{
public TmCalculator create2AT4GC()
{
return new TmCalculator()
{
@Override
public double calcTm(CharSequence s)
{
(...implements ... )
}
};
}

public TmCalculator createNearestNeighbor()
{
return new TmCalculator()
{
@Override
public double calcTm(CharSequence s)
{
(...implements ... )
}
};
}
}


That's it

Pierre

Biostar.com: A StackExchange engine for Bioinformatics

Just In Case You Don't Know:



Biostar is a site running the StackExchange engine (the same engine used by http://stackoverflow.com). The site's focus is bioinformatics, computational genomics and biological data analysis(...)No question is too trivial or too "newbie". Please look around to see if your question has already been asked (and maybe even answered!) before you ask. But if you end up asking a question that has been asked before, that is OK and deliberately allowed. Other users will hopefully edit in links to related or similar questions to help future visitors find their way..

Learning DesignPatterns: The Builder Pattern

largely inspired by Wikipedia:"Builder Pattern is a software design pattern. The intention is to abstract steps of construction of objects so that different implementations of these steps can construct different representations of objects."

Example


/**
* Taxon, the class to be constructed
*/
class Taxon
{
private int id=-1;
private int parent_id=-1;
private String name=null;

public void setId(int id) { this.id=id; }
public void setParentId(int parent_id) { this.parent_id=parent_id; }
public void setName(String name) { this.name=name; }
public int getId() { return this.id;}
public int getParentId() { return this.parent_id;}
public String getName() { return this.name;}
}

/** Abstract interface for creating Taxons */
abstract class TaxonBuilder
{
protected Taxon taxon=new Taxon();

public void createNewTaxon()
{
this.taxon = new Taxon();
}

public Taxon getTaxon()
{
return this.taxon;
}
public abstract void buildId();
public abstract void buildParentId();
public abstract void buildName();
}
/** concrete TaxonBuilder builder for HomoSapiens */
class HomoSapiensBuilder extends TaxonBuilder
{
public void buildId() { super.taxon.setId(9606);}
public void buildParentId() { super.taxon.setParentId(9605);}
public void buildName() { super.taxon.setName("Homo Sapiens");}
}

/** concrete TaxonBuilder builder for Platypus */
class PlatypusBuilder extends TaxonBuilder
{
public void buildId() { super.taxon.setId(9258);}
public void buildParentId() { super.taxon.setParentId(9257);}
public void buildName() { super.taxon.setName("Platypus");}
}

/** constructs a Taxon object by calling its TaxonBuilder */
class TaxonAssembler
{
private TaxonBuilder builder;

public void setTaxonBuilder(TaxonBuilder builder)
{
this.builder=builder;
}

public void assemble()
{
this.builder.createNewTaxon();
this.builder.buildId();
this.builder.buildParentId();
this.builder.buildName();
}
}

public class Test
{
public static void main(String[] args) {
{
TaxonBuilder builder=new PlatypusBuilder();
TaxonAssembler assembler=new TaxonAssembler();
assembler.setBuilder(builder);
assembler.assemble();
Taxon taxon=builder.getTaxon();
}
}

That's it.

Pierre

23 March 2010

Learning DesignPatterns: AbstractFactoryPattern

In the very next posts, I'll post my notes about the "Design Patterns". A Design Pattern is a "general reusable solution to a commonly occurring problem in software design". Here, I'll follow the patterns described at http://c2.com/cgi/wiki?DesignPatternsBook and will try to apply them to Bioinformatics. Today I'm starting with the AbstractFactoryPattern.

via Wikipedia:AbstractFactoryPattern provides a way to encapsulate a group of individual factories that have a common theme. In normal usage, the client software creates a concrete implementation of the abstract factory and then uses the generic interfaces to create the concrete objects that are part of the theme.
Use of this pattern makes it possible to interchange concrete classes without changing the code that uses them, even at runtime.


Example


/** the result of a pairwise alignment */
interface Alignment
{
public void print(PrintWriter out);
public double getScore();
}

/** interface for a pairwise alignment */
interface PairwiseAlignment
{
public Alignment align(CharSequence seq1,CharSequence seq2);
}

/** implementation of a global PairwiseAlignment */
class NeedlemanWunsch
implements PairwiseAlignment
{
public Alignment align(CharSequence seq1,CharSequence seq2)
{
(...)
}
}

/** implementation of a local PairwiseAlignment */
class SmithWaterman
implements PairwiseAlignment
{
public Alignment align(CharSequence seq1,CharSequence seq2)
{
(...)
}
}

/** abstract PairwiseAlignmentFactory
* contains an abstract method 'createPairwiseAligner'
* the static function 'newInstance' returns a new 'PairwiseAlignmentFactoryImpl'
*/
abstract class PairwiseAlignmentFactory
{
protected boolean global=true;

protected PairwiseAlignmentFactory()
{
}

public void setGlobal(boolean global)
{
this.global=global;
}

public boolean isGlobal()
{
return this.global;
}

/** abstract. to be implemented */
public abstract PairwiseAlignment createPairwiseAligner();

/** creates a new concrete instance of PairwiseAlignmentFactory */
static public PairwiseAlignmentFactory newInstance()
{
//described later...
return new PairwiseAlignmentFactoryImpl();
}
}

/** a concrete PairwiseAlignmentFactory
* implements createPairwiseAligner()
* instance created by PairwiseAlignment.newInstance()
*/
class PairwiseAlignmentFactoryImpl
extends
{
@Override
public PairwiseAlignment createPairwiseAligner()
{
return isGlobal() ?
new NeedlemanWunsch():
: new SmithWaterman();
}
}


That's it

Pierre