Showing posts with label samtools. Show all posts
Showing posts with label samtools. Show all posts

17 May 2016

finding new intron-exon junctions using the public Encode RNASeq data

I've been asked to look for some new / suspected / previously uncharacterized intron-exon junctions in public RNASeq data.
I've used the BAMs under http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/.

The following command is used to build the list of BAMs:

curl -s  "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/" |\
tr ' <>"' "\n" | grep -F .bam | grep -v bai | sort | uniq | sed 's/.bam$//' | sed 's/$/ \\/' 

wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep1V2 \
wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2 \
wgEncodeCaltechRnaSeqGm12878R1x75dSplicesRep1V2 \
wgEncodeCaltechRnaSeqGm12878R1x75dSplicesRep2V2 \
wgEncodeCaltechRnaSeqGm12878R2x75Il200AlignsRep1V2 \
wgEncodeCaltechRnaSeqGm12878R2x75Il200AlignsRep2V2 \
wgEncodeCaltechRnaSeqGm12878R2x75Il200SplicesRep1V2 \
wgEncodeCaltechRnaSeqGm12878R2x75Il200SplicesRep2V2 \
wgEncodeCaltechRnaSeqGm12878R2x75Il400AlignsRep2V2 \
wgEncodeCaltechRnaSeqGm12878R2x75Il400SplicesRep2V2 \
(...)

This list is inserted as a list named SAMPLES a Makefile.

For each BAM, we use samtools to retrieve the reads in the region(s)) of interest. The reads are then filtered with samjs (https://github.com/lindenb/jvarkit/wiki/SamJS) to only keep the reads carrying an intron-exon junction at the desired location(s). Basically, the javascript-based filter loops over the CIGAR string of the read, computes the genomic interval skipped when the cigar operator is a deletion or a skipped region/intron. The read is printed if it describes the new intron-exon junction.

All in one:




That's it,

Pierre



29 June 2015

A BLAST to SAM converter.

Some times ago, I've received a set of Ion-Torrent /mate-reads with a poor quality. I wasn't able to align much things using bwa. I've always wondered if I could get better alignments using NCBI-BLASTN (short answer: no) . That's why I asked guyduche, my intership student to write a C program to convert the output of blastn to SAM. His code is available on github at :

The input for blast2sam is
  • the XML output of NCBI blastn (or stdin)
  • The single or pair of fastq file(s)
  • The reference sequence indexed with picard
.

Example:

fastq2fasta in.R1.fq.gz in.R2.fq.gz |\
blastn -db REFERENCE   -outfmt 5 | \
blast2bam -o result.bam -W 40 -R '@RG   ID:foo  SM:sample' - REFERENCE.dict  in.R1.fq.gz in.R2.fq.gz

Output:
@SQ SN:gi|9629357|ref|NC_001802.1|  LN:9181 
@RG ID:foo  SM:sample
@PG ID:Blast2Bam    PN:Blast2Bam    VN:0.1  CL:../../bin/blast2bam -o results.sam -W 40 -R @RG  ID:foo  SM:sample - db.dict test_1.fastq.gz test_2.fastq.gz
(...)
ERR656485.2 83  gi|9629357|ref|NC_001802.1| 715 60  180S7=1X8=1X11=1X2=2X4=1X14=1X8=1X33=1X4=1X2=1X5=1X2=1X6=1S =   715 -119    CCTAGTGTTGCTTGCTTTTCTTCTTTTTTTTTTCAAGCAGAAGACGGCATACGAGATCCTCTATCGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTAAGATAGAGGAAGAACAAAACAAATGTCAGCAAAGTCAGCAAAAGACACAGCAGGAAAAAGGGGCTGACGGGAAGGTCAGTCAAAATTATCCTATAGTGCAAAATCTCCAAGGGCAAATGGTACACCAGGCCATGTCACCTAGAACTTTAAATGCATGGGTAAAAGTAATAGAGGAAAAGGCCTTTAGCCCAN    (),.((((,(((((,((.((.-(>69>20E>6/=>5EC@9-52?BEE::2951.)74B64=B==FFAF=A??59:>FFFDF:55GGFGF?DFGGFE868>GGGFGGGGED;FGFFGGGGGGGGGGGEFFGE9GGGGFGGGGGGGGDGECGGFGGGGGGGGGGFGGGGGEGGFGGGGGGFFGGGGGFF?EGGFFFEGGGGGGGGFEGGGEGGGFEGGGGGGGGGGDGFFCEGFGGGGGGGGGGGFFECFGGGGFGGGGGGGGGGGFCGGGGGGGGGGGGGGGGGGFGGGGGGGGF@CCA8!    NM:i:13 RG:Z:foo    AS:i:80 XB:f:148.852    XE:Z:4.07e-39
ERR656485.2 163 gi|9629357|ref|NC_001802.1| 715 60  73S7=1X8=1X11=1X2=2X4=1X14=1X8=1X33=1X4=1X2=1X5=1X2=1X8=106S    =   715 119 NAGATAGAGGAAGAACAAAACAAATGTCAGCAAAGTCAGCAAAAGACACAGCAGGAAAAAGGGGCTGACGGGAAGGTCAGTCAAAATTATCCTATAGTGCAAAATCTCCAAGGGCAAATGGTACACCAGGCCATGTCACCTAGAACTTTAAATGCATGGGTAAAAGTAATAGAGGAAAAGGCCTTTAGCCCAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTCTCATTACAAAAAAAACATACACAATAAATGATATAAGCGGAATCAACAGCATGA    !8A@CGGEFGFGCDFGGGGGGGGGGGGGGFGGGGGFGFGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGFGGFGGGGGGGGGEGFGFGGGFFGGGGGGGGFGGGGGGGGGGGGFFFFGGGGGG=FFGGFFDGGGGGGGG8FGFGGGGGGGGGFGGGGGGGGGGFDGGFGGFGGGFFFGFF8DFDFDFFFFFFFFFBCDB<@EAFB@ABAC@CDFF?4>EEFE<*>BDAFB@FFBFF>((6<5CC.;C;=D9106(.))).)-46<<))))))))))((,(-)))()((()))    NM:i:13 RG:Z:foo    AS:i:82 XB:f:152.546    XE:Z:3.15e-40
(...)
Now, I would be interested in finding another dataset where this tool could be successfully used.
That's it,
Pierre

04 December 2014

XML+XSLT = #Makefile -based #workflows for #bioinformatics

I've recently read some conversations on Twitter about Makefile-based bioinformatics workflows. I've suggested on biostars.org (Standard simple format to describe a bioinformatics analysis pipeline) that a XML file could be used to describe a model of data and XSLT could transform this model to a Makefile-based workflow. I've already explored this idea in a previous post (Generating a pipeline of analysis (Makefile) for NGS with xslt. ) and in our lab, we use JSON and jsvelocity to generate our workflows (e.g: https://gist.github.com/lindenb/3c07ca722f793cc5dd60).
In the current post, I'll describe my github repository containing a complete XML+XSLT example: https://github.com/lindenb/ngsxml.

Download the data

git clone "https://github.com/lindenb/ngsxml.git"


The model
The XML model is self-explanatory:
<?xml version="1.0" encoding="UTF-8"?>
<model name="myProject" description="my project" directory="OUT">
  <project name="Proj1">
    <sample name="Sample1">
      <fastq>
        <for>test/fastq/sample_1_01_R1.fastq.gz</for>
        <rev>test/fastq/sample_1_01_R2.fastq.gz</rev>
      </fastq>
      <fastq id="groupid2" lane="2" library="lib1" platform="ILMN" median-size="98">
        <for>test/fastq/sample_1_02_R1.fastq.gz</for>
        <rev>test/fastq/sample_1_02_R2.fastq.gz</rev>
      </fastq>
    </sample>
    <sample name="Sample2">
      <fastq>
        <for>test/fastq/sample_2_01_R1.fastq.gz</for>
        <rev>test/fastq/sample_2_01_R2.fastq.gz</rev>
      </fastq>
    </sample>
  </project>
</model>


Validating the model:
This XML document is possibly validated with a XML schema:
$ xmllint  --schema xsd/schema01.xsd --noout test/model01.xml
test/model01.xml validates


Generating the Makefile:
The XML document is transformed into a Makefile using the following XSLT stylesheet: https://github.com/lindenb/ngsxml/blob/master/stylesheets/model2make.xsl
xsltproc --output Makefile stylesheets/model2make.xsl test/model01.xml
The Makefile:
# Name
#  myProject
# Description:
#  my project
# 
include config.mk
OUTDIR=OUT
BINDIR=$(abspath ${OUTDIR})/bin


# if tools are undefined
bwa.exe ?=${BINDIR}/bwa-0.7.10/bwa
samtools.exe ?=${BINDIR}/samtools-0.1.19/samtools
bcftools.exe ?=${BINDIR}/samtools-0.1.19/bcftools/bcftools
tabix.exe ?=${BINDIR}/tabix-0.2.6/tabix
bgzip.exe ?=${BINDIR}/tabix-0.2.6/bgzip

.PHONY=all clean all_bams all_vcfs

all: all_vcfs


all_vcfs:  \
 $(OUTDIR)/Projects/Proj1/VCF/Proj1.vcf.gz


all_bams:  \
 $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam \
 $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam

#
# VCF for project 'Proj1'
# 
$(OUTDIR)/Projects/Proj1/VCF/Proj1.vcf.gz : $(addsuffix .bai, $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam) \
 $(addsuffix .fai,${REFERENCE}) ${samtools.exe}  ${bgzip.exe} ${tabix.exe} ${bcftools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} mpileup -uf ${REFERENCE} $(basename $(filter %.bai,$^)) | \
 ${bcftools.exe} view -vcg - > $(basename $@)  && \
 ${bgzip.exe} -f $(basename $@) && \
 ${tabix.exe} -f -p vcf $@
 
    



#
# index final BAM for Sample 'Sample1'
# 
$(addsuffix .bai, $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam): $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} index  $<
#
# prepare final BAM for Sample 'Sample1'
# 
$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam : $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}rmdup.bam
 mkdir -p $(dir $@) && \
 cp  $< $@


$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}rmdup.bam : $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}merged.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} rmdup  $<  $@



#
# merge BAMs 
#
$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}merged.bam :  \
  $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam \
  $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
  ${samtools.exe} merge -f $@ $(filter %.bam,$^)
  



 
#
# Index BAM $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam
#
$(addsuffix .bai,$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam ): $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam ${samtools.exe}
 ${samtools} index $<

#
# Align test/fastq/sample_1_01_R1.fastq.gz and test/fastq/sample_1_01_R2.fastq.gz
#
$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam : \
 test/fastq/sample_1_01_R1.fastq.gz  \
 test/fastq/sample_1_01_R2.fastq.gz \
 $(addsuffix .bwt,${REFERENCE}) \
 ${bwa.exe} ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${bwa.exe} mem -R '@RG\tID:idp20678948\tSM:Sample1\tLB:Sample1\tPL:ILLUMINA\tPU:1' \
  ${REFERENCE} \
  test/fastq/sample_1_01_R1.fastq.gz \
  test/fastq/sample_1_01_R2.fastq.gz |\
 ${samtools.exe} view -uS - |\
 ${samtools.exe} sort - $(basename $@) 





 
#
# Index BAM $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam
#
$(addsuffix .bai,$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam ): $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam ${samtools.exe}
 ${samtools} index $<

#
# Align test/fastq/sample_1_02_R1.fastq.gz and test/fastq/sample_1_02_R2.fastq.gz
#
$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam : \
 test/fastq/sample_1_02_R1.fastq.gz  \
 test/fastq/sample_1_02_R2.fastq.gz \
 $(addsuffix .bwt,${REFERENCE}) \
 ${bwa.exe} ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${bwa.exe} mem -R '@RG\tID:groupid2\tSM:Sample1\tLB:lib1\tPL:ILMN\tPU:2\tPI:98' \
  ${REFERENCE} \
  test/fastq/sample_1_02_R1.fastq.gz \
  test/fastq/sample_1_02_R2.fastq.gz |\
 ${samtools.exe} view -uS - |\
 ${samtools.exe} sort - $(basename $@) 




#
# index final BAM for Sample 'Sample2'
# 
$(addsuffix .bai, $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam): $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} index  $<
#
# prepare final BAM for Sample 'Sample2'
# 
$(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam : $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}rmdup.bam
 mkdir -p $(dir $@) && \
 cp  $< $@


$(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}rmdup.bam : $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} rmdup  $<  $@





 
#
# Index BAM $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam
#
$(addsuffix .bai,$(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam ): $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam ${samtools.exe}
 ${samtools} index $<

#
# Align test/fastq/sample_2_01_R1.fastq.gz and test/fastq/sample_2_01_R2.fastq.gz
#
$(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam : \
 test/fastq/sample_2_01_R1.fastq.gz  \
 test/fastq/sample_2_01_R2.fastq.gz \
 $(addsuffix .bwt,${REFERENCE}) \
 ${bwa.exe} ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${bwa.exe} mem -R '@RG\tID:idp20681172\tSM:Sample2\tLB:Sample2\tPL:ILLUMINA\tPU:1' \
  ${REFERENCE} \
  test/fastq/sample_2_01_R1.fastq.gz \
  test/fastq/sample_2_01_R2.fastq.gz |\
 ${samtools.exe} view -uS - |\
 ${samtools.exe} sort - $(basename $@) 




$(addsuffix .fai,${REFERENCE}): ${REFERENCE} ${samtools.exe}
 ${samtools.exe} faidx $<

$(addsuffix .bwt,${REFERENCE}): ${REFERENCE} ${bwa.exe}
 ${bwa.exe} index $<


${BINDIR}/bwa-0.7.10/bwa :
 rm -rf $(BINDIR)/bwa-0.7.10/ && \
 mkdir -p $(BINDIR) && \
 curl -o $(BINDIR)/bwa-0.7.10.tar.bz2 -L "http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.10.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj $(BINDIR)/bwa-0.7.10.tar.bz2 -C $(OUTDIR)/bin  && \
 rm $(BINDIR)/bwa-0.7.10.tar.bz2 && \
 make -C $(dir $@)

${BINDIR}/samtools-0.1.19/bcftools/bcftools: ${BINDIR}/samtools-0.1.19/samtools

${BINDIR}/samtools-0.1.19/samtools  : 
 rm -rf $(BINDIR)/samtools-0.1.19/ && \
 mkdir -p $(BINDIR) && \
 curl -o $(BINDIR)/samtools-0.1.19.tar.bz2 -L "http://sourceforge.net/projects/samtools/files/samtools-0.1.19.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj $(BINDIR)/samtools-0.1.19.tar.bz2 -C $(OUTDIR)/bin  && \
 rm $(BINDIR)/samtools-0.1.19.tar.bz2 && \
 make -C $(dir $@)


${BINDIR}/tabix-0.2.6/bgzip : ${BINDIR}/tabix-0.2.6/tabix


${BINDIR}/tabix-0.2.6/tabix  : 
 rm -rf $(BINDIR)/tabix-0.2.6/ && \
 mkdir -p $(BINDIR) && \
 curl -o $(BINDIR)/tabix-0.2.6.tar.bz2 -L "http://sourceforge.net/projects/samtools/files/tabix-0.2.6.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj $(BINDIR)/tabix-0.2.6.tar.bz2 -C $(OUTDIR)/bin  && \
 rm $(BINDIR)/tabix-0.2.6.tar.bz2 && \
 make -C $(dir $@) tabix bgzip

clean:
 rm -rf ${BINDIR}

Drawing the Workflow:
The workflow is drawn with https://github.com/lindenb/makefile2graph.


Running Make:
And here is the output of make:
rm -rf /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/ && \
 mkdir -p /home/lindenb/src/ngsxml/OUT/bin && \
 curl -o /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10.tar.bz2 -L "http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.10.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10.tar.bz2 -C OUT/bin  && \
 rm /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10.tar.bz2 && \
 make -C /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/
bwa-0.7.10/
bwa-0.7.10/bamlite.c
(...)
gcc -g -Wall -Wno-unused-function -O2 -msse -msse2 -msse3 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS QSufSort.o bwt_gen.o bwase.o bwaseqio.o bwtgap.o bwtaln.o bamlite.o is.o bwtindex.o bwape.o kopen.o pemerge.o bwtsw2_core.o bwtsw2_main.o bwtsw2_aux.o bwt_lite.o bwtsw2_chain.o fastmap.o bwtsw2_pair.o main.o -o bwa -L. -lbwa -lm -lz -lpthread
make[1]: Entering directory `/home/lindenb/src/ngsxml'
/home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/bwa index test/ref/ref.fa
rm -rf /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/ && \
 mkdir -p /home/lindenb/src/ngsxml/OUT/bin && \
 curl -o /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19.tar.bz2 -L "http://sourceforge.net/projects/samtools/files/samtools-0.1.19.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19.tar.bz2 -C OUT/bin  && \
 rm /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19.tar.bz2 && \
 make -C /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/
samtools-0.1.19/
samtools-0.1.19/.gitignore
samtools-0.1.19/AUTHORS
(...)
gcc -g -Wall -O2 -o bamcheck bamcheck.o -L.. -lm -lbam -lpthread -lz
make[3]: Leaving directory `/home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/misc'
make[2]: Leaving directory `/home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19'
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/bwa mem -R '@RG\tID:idp11671724\tSM:Sample1\tLB:Sample1\tPL:ILLUMINA\tPU:1' \
  test/ref/ref.fa \
  test/fastq/sample_1_01_R1.fastq.gz \
  test/fastq/sample_1_01_R2.fastq.gz |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools view -uS - |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools sort - OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__1_sorted 
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/bwa mem -R '@RG\tID:groupid2\tSM:Sample1\tLB:lib1\tPL:ILMN\tPU:2\tPI:98' \
  test/ref/ref.fa \
  test/fastq/sample_1_02_R1.fastq.gz \
  test/fastq/sample_1_02_R2.fastq.gz |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools view -uS - |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools sort - OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__2_sorted 
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
  /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools merge -f OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__merged.bam OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__1_sorted.bam OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__2_sorted.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools rmdup  OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__merged.bam  OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__rmdup.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 cp  OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__rmdup.bam OUT/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools index  OUT/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample2/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/bwa mem -R '@RG\tID:idp11673828\tSM:Sample2\tLB:Sample2\tPL:ILLUMINA\tPU:1' \
  test/ref/ref.fa \
  test/fastq/sample_2_01_R1.fastq.gz \
  test/fastq/sample_2_01_R2.fastq.gz |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools view -uS - |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools sort - OUT/Projects/Proj1/Samples/Sample2/BAM/__DELETE__1_sorted 
mkdir -p OUT/Projects/Proj1/Samples/Sample2/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools rmdup  OUT/Projects/Proj1/Samples/Sample2/BAM/__DELETE__1_sorted.bam  OUT/Projects/Proj1/Samples/Sample2/BAM/__DELETE__rmdup.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample2/BAM/ && \
 cp  OUT/Projects/Proj1/Samples/Sample2/BAM/__DELETE__rmdup.bam OUT/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample2/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools index  OUT/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam
/home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools faidx test/ref/ref.fa
rm -rf /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6/ && \
 mkdir -p /home/lindenb/src/ngsxml/OUT/bin && \
 curl -o /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6.tar.bz2 -L "http://sourceforge.net/projects/samtools/files/tabix-0.2.6.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6.tar.bz2 -C OUT/bin  && \
 rm /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6.tar.bz2 && \
 make -C /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6/ tabix bgzip
tabix-0.2.6/
tabix-0.2.6/ChangeLog
tabix-0.2.6/Makefile
(...)
make[2]: Leaving directory `/home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6'
mkdir -p OUT/Projects/Proj1/VCF/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools mpileup -uf test/ref/ref.fa OUT/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam OUT/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam | \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/bcftools/bcftools view -vcg - > OUT/Projects/Proj1/VCF/Proj1.vcf  && \
 /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6/bgzip -f OUT/Projects/Proj1/VCF/Proj1.vcf && \
 /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6/tabix -f -p vcf OUT/Projects/Proj1/VCF/Proj1.vcf.gz
make[1]: Leaving directory `/home/lindenb/src/ngsxml'
At the end, a VCF is generated
##fileformat=VCFv4.1
##samtoolsVersion=0.1.19-44428cd
(...)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2
chr4_gl000194_random 1973 . C G 14.4 . DP=3;VDB=2.063840e-02;AF1=1;AC1=4;DP4=0,0,3,0;MQ=60;FQ=-32.3 GT:PL:GQ 1/1:31,6,0:11 1/1:17,3,0:8
chr4_gl000194_random 2462 . A T 14.4 . DP=3;VDB=2.063840e-02;AF1=1;AC1=4;DP4=0,0,0,3;MQ=60;FQ=-32.3 GT:PL:GQ 1/1:31,6,0:11 1/1:17,3,0:8
chr4_gl000194_random 2492 . G C 14.4 . DP=3;VDB=2.063840e-02;AF1=1;AC1=4;DP4=0,0,0,3;MQ=60;FQ=-32.3 GT:PL:GQ 1/1:31,6,0:11 1/1:17,3,0:8
chr4_gl000194_random 2504 . A T 14.4 . DP=3;VDB=2.063840e-02;AF1=1;AC1=4;DP4=0,0,0,3;MQ=60;FQ=-32.3 GT:PL:GQ 1/1:31,6,0:11 1/1:17,3,0:8
That's it,

Pierre

30 October 2013

GNU Make: saving the versions of the tools using 'order-only-prerequisites' : my notebook

Rule 3 of "Ten Simple Rules for Reproducible Computational Research". is
:

Archive the Exact Versions of All External Programs Used
.
I work with Makefile-based workflows: how can I save the version of each software used when I invoke 'make', whatever the target is ? A naive solution is to add a dependency to each target. For example, the following makefile takes a simple SAM file, convert it to BAM, sort and index. For each target, I've added a dependency named "dump_params" that append the version of samtools to a file "config.txt".

But that solution doesn't work because make re-builds all targets even if the top target already exists.
$ make

date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam


$ make

date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam

The solution I got via Stackoverflow is to use a order-only-prerequisites: "Order-only prerequisites can be specified by placing a pipe symbol (|) in the prerequisites list: any prerequisites to the left of the pipe symbol are normal; any prerequisites to the right are order-only... (...) Note that if you declare the same file to be both a normal and an order-only prerequisite, the normal prerequisite takes precedence (...)". The makefile with the 'order-only-prerequisites' is now:




And that works ! the final target is generated only once, but the file 'config.txt' is always generated.
$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam

$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt

$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
That's it,
Pierre

Update :another solution

Citing MadScientist's answer on stackoverflow : Another option is to use immediately expanded shell functions, like:
__dummy := $(shell echo "Makefile was run." >> config.txt)
Since it's immediately expanded the shell script will be invoked once, as the makefile is read in. There's no need to define a dump_params target or include it as a prerequisite. This is more old-school, but has the advantage that it will run for every invocation of make, without having to go through and ensure every target has the proper order-only prerequisite defined.




07 October 2013

Software Environment Management with Modules: my notebook


The following question was recently asked on Biostar "Bioinformatics: how to version control small scripts located all over the server". I suggested to put the scripts in a central repository (under git or whatever ) and to use symbolic links in the workspaces to manage the files. On the other hand, Alex Reynolds suggested to use a Module to deploy versions of a given package.

http://modules.sourceforge.net/: " The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles. Each modulefile contains the information needed to configure the shell for an application. Once the Modules package is initialized, the environment can be modified on a per-module basis using the module command which interprets modulefiles. Typically modulefiles instruct the module command to alter or set shell environment variables such as PATH, MANPATH, etc. modulefiles may be shared by many users on a system and users may have their own collection to supplement or replace the shared modulefiles."




This tool was also suggested on cited by Lex Nederbragt on twitter:



I've just played with Modules, here is my (short) experience.

I've got two versions of samtools on my server : 0.1.18 and 0.1.19, I want to easily switch from one version to another.

I created a hidden directory in my home:
mkdir ~/.modules
it contains a directory "samtools" that will contain the two 'modules files' for the two versions of samtools:
mkdir ~/.modules/samtools

The directory '~/.modules' is added to the variable ${MODULESPATH} ( The path that the module command searches when looking for modulefiles )
$ export MODULEPATH=${MODULEPATH}:${HOME}/.modules

Create the Module File '${HOME}/.modules/samtools/0.1.18' for samtools 0.1.18: This module file add the PATH to samtools0.1.18 and bcftools0.1.18
#%Module1.0

proc ModulesHelp { } {
global dotversion
puts stderr "\tSamtools"
}

module-whatis "Samtools 0.1.18" 
prepend-path PATH /commun/data/packages/samtools-0.1.18/
prepend-path PATH /commun/data/packages/samtools-0.1.18/bcftools

Create the Module File '${HOME}/.modules/samtools/0.1.19' for samtools 0.1.19: This module file add the PATH to samtools0.1.19 and bcftools0.1.19
#%Module1.0

proc ModulesHelp { } {
global dotversion
puts stderr "\tSamtools"
}

module-whatis "Samtools 0.1.19" 
prepend-path PATH /commun/data/packages/samtools-0.1.19/
prepend-path PATH /commun/data/packages/samtools-0.1.19/bcftools

We also create a file '${HOME}/.modules/samtools/.version' to define the default version of samtools:
#%Module1.0
set ModulesVersion "0.1.19"

On startup the shell doesn't know samtools or bcftools:
$ which samtools
/usr/bin/which: no samtools in (/commun/sge_root/bin/lx24-amd64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/lindenb/bin)

$ which bcftools
/usr/bin/which: no bcftools in (/commun/sge_root/bin/lx24-amd64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/lindenb/bin)
List the available modules :
$ module avail                                                                                                

------------------------------------------------ /usr/share/Modules/modulefiles ------------------------------------------------
dot           module-cvs    module-info   modules       mpich2-x86_64 null          use.own

---------------------------------------------------- /home/lindenb/.modules ----------------------------------------------------
samtools/0.1.18          samtools/0.1.19(default)
Let's load the default configuration for samtools:
$ module load samtools

now, the shell know the PATH to samtools:

$ which samtools
/commun/data/packages/samtools-0.1.19/samtools
$ which bcftools
/commun/data/packages/samtools-0.1.19/bcftools/bcftools

Unload the module for samtools:
$ module unload samtools
$ which samtools
/usr/bin/which: no samtools in (/commun/sge_root/bin/lx24-amd64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/lindenb/bin)

$ which bcftools
/usr/bin/which: no bcftools in (/commun/sge_root/bin/lx24-amd64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/lindenb/bin)

Now load the (old) version 0.1.18 of samtools
$ module load samtools/0.1.18

The old versions of samtools and bcftools are now in the $PATH:
$ which samtools
/commun/data/packages/samtools-0.1.18/samtools

$ which bcftools
/commun/data/packages/samtools-0.1.18/bcftools/bcftools
That's it,

Pierre

23 April 2013

playing with BWA-MEM : my notebook

BWA-MEM, is a new tool that is part of the latest version of BWA. As said Heng Li on Biostar bwa-mem can be used to identify the viral integration sites in the human genome. Here I've used various short reads to explore how bwa maps the pairs. The sequences I used below are:

  • NOTCH2a and NOTCH2b two sequences on the chromosomes 1, on the same strand
  • NOTCH2del : same as NOTCH2a but with a deletion
  • ROXAN1a and ROXAN1b: two sequences on the chromosome 22, on the same strand
  • NOTCH2ins is NOTCH2a with and insertion of ROXAN1a
The following results were generated with the makefile below. Each pair of fastq was submitted twice (1_* and 2_*). I also added some pairs of reads from another source of FASTQs to let bwa infer the size of the fragments (see biostar: http://www.biostars.org/p/69694).

NOTCH2a vs NOTCH2b BWA-MEM options: none

NOTCH2a vs NOTCH2b
BWA-MEM options: none
NOT Properly mapped reads because they're on the same strand.
ReadFlagChromPosqualCigarExtra
1_Notch2p1=651120572000950MNM:i:0 AS:i:50 XS:i:45
1_Notch2p2=1291120572200950MNM:i:0 AS:i:50 XS:i:45
2_Notch2p11120572000950MNM:i:0 AS:i:50 XS:i:45
2_Notch2p21120572200950MNM:i:0 AS:i:50 XS:i:45

NOTCH2a vs revcomp(NOTCH2b) BWA-MEM options: none

NOTCH2a vs revcomp(NOTCH2b)
BWA-MEM options: none
Properly mapped reads.
ReadFlagChromPosqualCigarExtra
1_Notch2pPR1=9911205720003950MNM:i:0 AS:i:50 XS:i:45
1_Notch2pPr2=14711205722003950MNM:i:0 AS:i:50 XS:i:45
2_Notch2pPR111205720003950MNM:i:0 AS:i:50 XS:i:45
2_Notch2pPr211205722003950MNM:i:0 AS:i:50 XS:i:45

NOTCH2del vs revcomp(NOTCH2b) BWA-MEM options: none

NOTCH2del vs revcomp(NOTCH2b)
BWA-MEM options: none
Deletion in Read1
ReadFlagChromPosqualCigarExtra
1_Notch2delpPR1=9911205720003937M20D31MNM:i:20 AS:i:42 XS:i:37
1_Notch2delpPr2=14711205722003950MNM:i:0 AS:i:50 XS:i:45
2_Notch2delpPR111205720003937M20D31MNM:i:20 AS:i:42 XS:i:37
2_Notch2delpPr211205722003950MNM:i:0 AS:i:50 XS:i:45

NOTCH2ins vs revcomp(NOTCH2b) BWA-MEM options:

NOTCH2ins vs revcomp(NOTCH2b)
BWA-MEM options: non
Soft clipping of chr1: two hits on different chromosomes.
ReadFlagChromPosqualCigarExtra
1_NOTCH2insRoxanpR1=9722417215666023S50M27SNM:i:0 AS:i:50 XS:i:0
1_NOTCH2insRoxanpr2=1451120572200950MNM:i:0 AS:i:50 XS:i:45
2_NOTCH2insRoxanpR122417215666023S50M27SNM:i:0 AS:i:50 XS:i:0
2_NOTCH2insRoxanpr21120572200950MNM:i:0 AS:i:50 XS:i:45


NOTCH2a+ROXAN1a vs revcomp(NOTCH2b) BWA-MEM options: none

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b)
BWA-MEM options: none
two hits on chromosome chr1 and chr22 for the 5' seq
nevertheless, properly paired was set.
ReadFlagChromPosqualCigarExtra
1_Notch2RoxanpPR1=991120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
1_Notch2RoxanpPR1=992241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
1_Notch2RoxanpPr2=1471120572200950MNM:i:0 AS:i:50 XS:i:45
2_Notch2RoxanpPR11120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
2_Notch2RoxanpPR12241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
2_Notch2RoxanpPr21120572200950MNM:i:0 AS:i:50 XS:i:45

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b+ROXAN1b) BWA-MEM options: none

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b+ROXAN1b)
BWA-MEM options: none
All fragments on chr1/chr22.
Reads are NOT properly paired.
ReadFlagChromPosqualCigarExtra
1_Notch2RoxanpR1=971120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
1_Notch2RoxanpR1=972241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
1_Notch2Roxanpr2=14522417217666050S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,-120572200,50M50S,9,0;
1_Notch2Roxanpr2=1451120572200950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,-41721766,50S50M,60,0;
2_Notch2RoxanpR11120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
2_Notch2RoxanpR12241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
2_Notch2Roxanpr222417217666050S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,-120572200,50M50S,9,0;
2_Notch2Roxanpr21120572200950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,-41721766,50S50M,60,0;

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b) BWA-MEM options: -C

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b)
BWA-MEM options: -C
I cannot see the effect of the option '-C':
"append FASTA/FASTQ comment to SAM output"
ReadFlagChromPosqualCigarExtra
1_Notch2RoxanpPR1=991120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
1_Notch2RoxanpPR1=992241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
1_Notch2RoxanpPr2=1471120572200950MNM:i:0 AS:i:50 XS:i:45
2_Notch2RoxanpPR11120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
2_Notch2RoxanpPR12241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
2_Notch2RoxanpPr21120572200950MNM:i:0 AS:i:50 XS:i:45

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b) BWA-MEM options: -M

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b)
BWA-MEM options: -M
Set the duplicate flags for the multiple hits.
ReadFlagChromPosqualCigarExtra
1_Notch2RoxanpPR1=991120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
1_Notch2RoxanpPR1s=3552241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
1_Notch2RoxanpPr21120572200950MNM:i:0 AS:i:50 XS:i:45
2_Notch2RoxanpPR11120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
2_Notch2RoxanpPR1s2241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
2_Notch2RoxanpPr21120572200950MNM:i:0 AS:i:50 XS:i:45

NOTCH2ins vs revcomp(ROXAN1b) BWA-MEM options: none

NOTCH2ins vs revcomp(ROXAN1b)
BWA-MEM options: none
Reads are mapped on chr22.
ReadFlagChromPosqualCigarExtra
1_NOTCH2insRoxanpPR122417215666023S50M27SNM:i:0 AS:i:50 XS:i:0
1_NOTCH2insRoxanpPr222417217666050MNM:i:0 AS:i:50 XS:i:0
2_NOTCH2insRoxanpPR122417215666023S50M27SNM:i:0 AS:i:50 XS:i:0
2_NOTCH2insRoxanpPr222417217666050MNM:i:0 AS:i:50 XS:i:0

That's it,

Pierre

The makefile

11 February 2013

Counting the reads in a BAM file using SGE and the Open-MPI library: my notebook.

In the current post, I'll describe my first Open MPI program. Open MPI is "a Message Passing Interface (MPI) library, a standardized and portable message-passing system to function on a wide variety of parallel computers". My C program takes a list of BAMs, distributes some jobs on the SGE (SUN/Oracle Grid Engine) to count the number of reads and returns the results to a master process.

Downloading and installing

The library was downloaded from http://www.open-mpi.org/software/ompi/v1.6/, compiled with the option --with-sge and installed in '/commun/data/packages/openmpi'.
./configure --prefix=/commun/data/packages/openmpi --with-sge
make 
make install

Configuring SGE to run MPI-based program

As described in http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/6ml49n2c0/index.html .
$ su
$ source /commun/sge_root/bird/common/settings.csh
$ setenv ARCH lx24-amd64
$ qconf -ap orte
And added:
pe_name            orte
slots              448
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE
Add orte to the parallele env list:
qconf -mattr queue pe_list "orte" all.q
Using qconf -mconf I've also set shell_start_mode to 'unix_behavior'.

Algorithm

In the 'main' method, test if this is the master(==0) or a slave process(!=0):
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
 {
 master(argc-1,&argv[1]);
 }
else
 {
 slave(argc-1,&argv[1]);
 }

MASTER

The master gets the number of available proc:
MPI_Comm_size(MPI_COMM_WORLD, &proc_count)
Loop over the number of available processors and the BAMS, for each BAM, create a fixed size structure containing the path to the BAM as well as the count of reads initialized to '0':
typedef struct Params_t
 {
 char filename[FILENAME_MAX];
 int is_error;
 long count_total;
 long count_mapped;
 long count_properly_paired;
 long count_dup;
 }Params;
This structure is sent to the slaves:
 MPI_Send(
 &param, /* message buffer */
 sizeof(Params), /* one data item */
 MPI_CHAR, /* data item is an array of bytes */
 rank, /* destination process rank */
 TAG_COUNT_BAM, /* user chosen message tag */
 MPI_COMM_WORLD /* default communicator */
 ); 
and we wait for the slaves to return those 'Params' with the filled values.:
MPI_Recv(&param, /* message buffer */
 sizeof(Params), /* one data item */
 MPI_CHAR, /* data item is an array of bytes */
 MPI_ANY_SOURCE, /* receive from any sender */
 MPI_ANY_TAG, /* any type of message */
 MPI_COMM_WORLD, /* default communicator */
 &status);
And the result is printed:
printf("%s\t%ld\t%ld\t%ld\t%ld\n",
 param.filename,
 param.count_total,
 param.count_mapped,
 param.count_properly_paired,
 param.count_dup
 );
At the end of the master method, when no more BAM is available, the slaves are released (tag is 'TAG_RELEASE_SLAVE') with :
MPI_Send(
 &empty, /* message buffer */
 sizeof(Params), /* one data item */
 MPI_CHAR, /* data item is an array of bytes */
 rank, /* destination process rank */
 TAG_RELEASE_SLAVE, /* SHutdown */
 MPI_COMM_WORLD
 ); /* default communicator */

SLAVE

The slave method receives some messages from the master:
MPI_Recv(
 &param,
 sizeof(Params),
 MPI_CHAR,
 0,
 MPI_ANY_TAG,
 MPI_COMM_WORLD,
 &status);
If the message was 'TAG_RELEASE_SLAVE' we exit the method. Else the BAM file is scanned using the SAmtools API for C.
count_reads(&param);
and the result is sent back to the master:
MPI_Send(
 &param,
 sizeof(Params),
 MPI_CHAR,
 0,
 0,
 MPI_COMM_WORLD
 );
.

Running the Job on SGE

I still don't understand why I need to write the following line:
# qsh -pe orte 4
Your job 84581 ("INTERACTIVE") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 84581 has been successfully scheduled.
Run the analysis:
qrsh -cwd -pe orte 100 /commun/data/packages/openmpi/bin/mpirun \
 /path/to/ompibam \
 `find /commun/data/projects/folder1234/align -name "*.bam"`
qstat -f shows the jobs running:
arch          states
   ---------------------------------------------------------------------------------
   all.q@node01                   BIP   0/15/64        2.76 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    15
   ---------------------------------------------------------------------------------
   all.q@node02                   BIP   0/14/64        3.89 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14
   ---------------------------------------------------------------------------------
   all.q@node03                   BIP   0/14/64        3.23 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14
   ---------------------------------------------------------------------------------
   all.q@node04                   BIP   0/14/64        3.68 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14
   ---------------------------------------------------------------------------------
   all.q@node05                   BIP   0/15/64        2.91 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    15
   ---------------------------------------------------------------------------------
   all.q@node06                   BIP   0/14/64        3.91 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14
   ---------------------------------------------------------------------------------
   all.q@node07                   BIP   0/14/64        3.79 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14 
output:
#filename total mapped properly_paired dup
/commun/data/projects/folder1234/11X0589_markdup.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0589_pair1_sorted.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0589_realigned.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0589_recal.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0602_pair1_sorted.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0602_realigned.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0602_markdup.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0602_pair1_unsorted.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0375_pair1_sorted.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0375_realigned.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0375_markdup.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0367_pair1_sorted.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0367_markdup.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0375_pair1_unsorted.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0367_realigned.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0291_pair1_sorted.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/11X0291_markdup.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/11X0291_realigned.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/11X0311_markdup.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0311_realigned.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0375_recal.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0589_pair1_unsorted.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0367_pair1_unsorted.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0290_pair1_sorted.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0290_realigned.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0291_pair1_unsorted.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/11X0311_pair1_sorted.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0240_pair1_sorted.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/11X0240_realigned.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/11X0240_markdup.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/XX10272_pair1_sorted.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/XX10272_markdup.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/XX10272_realigned.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/11X0602_recal.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0290_markdup.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0290_pair1_unsorted.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0247_pair1_sorted.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0456_pair1_sorted.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0456_realigned.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0247_markdup.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0247_realigned.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0456_markdup.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0367_recal.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0305_pair1_sorted.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0305_markdup.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0305_realigned.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0542_pair1_sorted.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/XX07551_recal.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/11X0542_realigned.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/11X0542_markdup.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/11X0311_pair1_unsorted.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0291_recal.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/XX07551_pair1_sorted.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/11X0240_pair1_unsorted.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/XX10272_pair1_unsorted.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/XX07551_markdup.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/XX07551_realigned.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/11X0305_pair1_unsorted.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0311_recal.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0456_pair1_unsorted.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0247_pair1_unsorted.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0542_pair1_unsorted.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/XX07551_pair1_unsorted.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/11X0433_pair1_sorted.bam 2178798 1800943 1249738 0
/commun/data/projects/folder1234/11X0433_markdup.bam 2178798 1800943 1249738 0
/commun/data/projects/folder1234/XX10272_recal.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/11X0433_realigned.bam 2178798 1800943 1249738 0
/commun/data/projects/folder1234/11X0240_recal.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/11X0290_recal.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0456_recal.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0247_recal.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0433_pair1_unsorted.bam 2178798 1800943 1249738 0
/commun/data/projects/folder1234/11X0305_recal.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0542_recal.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/11X0433_recal.bam 2178798 1800943 1249738 0
However I noticed seems that running this code on the cluster is slower that running it on the master node...

The C code

Note: the C code is compiled with ortecc, not gcc.


I'm still very new to this API and to those concepts. Feel free to suggest anything :-)
That's it,
Pierre

05 February 2013

Making use of Picard Metrics files using XML and XSLT. #ngs

Many tools in the Picard package produce some "Metrics File" (described at http://picard.sourceforge.net/picard-metric-definitions.shtml). The picard API contains a java parser "MetricsFile" parsing those metrics-file:

MetricsFile<MetricBase, Comparable<?>> metricsFile=new MetricsFile<MetricBase, Comparable<?>>();
metricsFile.read(new FileReader("metrics.txt"));
In order produce some custom reports from those files, I've created a tool that dump the content of the MetricsFile as a XML file. The source code is available at: http://code.google.com/p/jvarkit/source/browse/trunk/src/main/java/fr/inserm/umr1087/jvarkit/tools/picard/metrics2xml/PicardMetricsToXML.java.

Compilation

$ mkdir tmp
$ javac -d tmp -cp  /path/to/picard.jar:/path/to/sam.jar \
     -sourcepath  src/main/java \
     src/main/java/fr/inserm/umr1087/jvarkit/tools/picard/metrics2xml/PicardMetricsToXML.java
$ jar vcf picardmetrics2xml.jar -C tmp .

Usage

Say you have used the tool 'CollectInsertSizeMetrics.jar' from picard:
$ java -jar/path/to/CollectInsertSizeMetrics.jar \
 O=out.metrics \
 I=/path/to/samtools/examples/sorted.bam \
 AS=true \
 R=/path/to/samtools/ex1.fa \
 H=chart.pdf
The file out.metrics looks like this:
## net.sf.picard.metrics.StringHeader
# net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=(...)
## net.sf.picard.metrics.StringHeader
# Started on: Tue Feb 05 12:51:30 CET 2013

## METRICS CLASS net.sf.picard.analysis.InsertSizeMetrics
MEDIAN_INSERT_SIZE MEDIAN_ABSOLUTE_DEVIATION MIN_INSERT_SIZE MAX_INSERT_SIZE MEAN_INSERT_SIZE STANDARD_DEVIATION READ_PAIRS
209 10 54 243 208.857506 13.614603 4716 FR 5 9 13 17 21 25 29 35 43 

## HISTOGRAM java.lang.Integer
insert_size All_Reads.fr_count
54 3
170 3
173 9
174 3
175 3
177 6
(...)
This file can be converted to XML using the following command:
$ java -cp /path/to/picard.jar:/path/to/sam.jar:picardmetrics2xml.jar file.metrics


<?xml version="1.0" encoding="UTF-8"?><picard-metrics xmlns="http://picard.sourc
eforge.net/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><metrics-file
 file="file.metrics"><headers><header class="net.sf.picard.metrics.StringHeader"
>net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=jeter2 INPUT=/ho
me/lindenb/package/samtools-0.1.18/examples/sorted.bam OUTPUT=jeter REFERENCE_SE
QUENCE=/home/lindenb/package/samtools-0.1.18/examples/ex1.fa ASSUME_SORTED=true 
   DEVIATIONS=10.0 MINIMUM_PCT=0.05 METRIC_ACCUMULATION_LEVEL=[ALL_READS] STOP_A
FTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL
=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false</header><h
eader class="net.sf.picard.metrics.StringHeader">Started on: Tue Feb 05 12:51:30
 CET 2013</header></headers><metrics><thead class="net.sf.picard.analysis.Insert
SizeMetrics"><th class="double">MEDIAN_INSERT_SIZE</th><th class="double">MEDIAN
_ABSOLUTE_DEVIATION</th><th class="int">MIN_INSERT_SIZE</th><th class="int">MAX_
INSERT_SIZE</th><th class="double">MEAN_INSERT_SIZE</th><th class="double">STAND
ARD_DEVIATION</th><th class="long">READ_PAIRS</th><th class="net.sf.picard.sam.S
amPairUtil$PairOrientation">PAIR_ORIENTATION</th><th class="int">WIDTH_OF_10_PER
CENT</th><th class="int">WIDTH_OF_20_PERCENT</th><th class="int">WIDTH_OF_30_PER
CENT</th><th class="int">WIDTH_OF_40_PERCENT</th><th class="int">WIDTH_OF_50_PER
CENT</th><th class="int">WIDTH_OF_60_PERCENT</th><th class="int">WIDTH_OF_70_PER
CENT</th><th class="int">WIDTH_OF_80_PERCENT</th><th class="int">WIDTH_OF_90_PER
CENT</th><th class="int">WIDTH_OF_99_PERCENT</th><th class="java.lang.String">SA
MPLE</th><th class="java.lang.String">LIBRARY</th><th class="java.lang.String">R
EAD_GROUP</th></thead><tbody><tr><td>209.0</td><td>10.0</td><td>54</td><td>243</
td><td>208.857506</td><td>13.614603</td><td>4716</td><td>FR</td><td>5</td><td>9<
/td><td>13</td><td>17</td><td>21</td><td>25</td><td>29</td><td>35</td><td>43</td
><td>65</td><td xsi:nil="true"/><td xsi:nil="true"/><td xsi:nil="true"/></tr></t
body></metrics><histogram class="java.lang.Integer"><thead><th>insert_size</th><
th>All_Reads.fr_count</th></thead><tbody><tr><td>54</td><td>3.0</td></tr><tr><td
>170</td><td>3.0</td></tr><tr><td>173</td><td>9.0</td></tr><tr><td>174</td><td>3
.0</td></tr><tr><td>175</td><td>3.0</td></tr><tr><td>177</td><td>6.0</td></tr><t
r><td>178</td><td>6.0</td></tr><tr><td>179</td><td>9.0</td></tr><tr><td>180</td>
<td>6.0</td></tr><tr><td>181</td><td>6.0</td></tr><tr><td>182</td><td>21.0</td><
/tr><tr><td>183</td><td>9.0</td></tr><tr><td>184</td><td>15.0</td></tr><tr><td>1
85</td><td>33.0</td></tr><tr><td>186</td><td>15.0</td></tr><tr><td>187</td><td>3
(...)

Converting to JSON

Now, we can convert the XML to whatever we want using XSLT. I wrote a stylesheet picardmetrics2json.xsl converting the XML to JSON (though, I should escape the quotes in the strings ).
$ xsltproc picardmetrics2json.xsl metrics.xml


{
    "metrics.xml": {
        "headers": [
            {
                "class": "net.sf.picard.metrics.StringHeader",
                "value": "net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=metrics.pdf INPUT=samtools-0.1.18/examples/sorted.bam OUTPUT=metrics.txt REFERENCE_SEQUENCE=/home/lindenb/package/samtools-0.1.18/examples/ex1.fa ASSUME_SORTED=true    DEVIATIONS=10.0 MINIMUM_PCT=0.05 METRIC_ACCUMULATION_LEVEL=[ALL_READS] STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false"
            },
            {
                "class": "net.sf.picard.metrics.StringHeader",
                "value": "Started on: Tue Feb 05 12:51:30 CET 2013"
            }
        ],
        "metrics": [
            {
                "MEDIAN_INSERT_SIZE": 209,
                "MEDIAN_ABSOLUTE_DEVIATION": 10,
                "MIN_INSERT_SIZE": 54,
                "MAX_INSERT_SIZE": 243,
                "MEAN_INSERT_SIZE": 208.857506,
                "STANDARD_DEVIATION": 13.614603,
                "READ_PAIRS": 4716,
                "PAIR_ORIENTATION": "FR",
                "WIDTH_OF_10_PERCENT": 5,
                "WIDTH_OF_20_PERCENT": 9,
                "WIDTH_OF_30_PERCENT": 13,
                "WIDTH_OF_40_PERCENT": 17,
                "WIDTH_OF_50_PERCENT": 21,
                "WIDTH_OF_60_PERCENT": 25,
                "WIDTH_OF_70_PERCENT": 29,
                "WIDTH_OF_80_PERCENT": 35,
                "WIDTH_OF_90_PERCENT": 43,
                "WIDTH_OF_99_PERCENT": 65,
                "SAMPLE": null,
                "LIBRARY": null,
                "READ_GROUP": null
            }
        ],
        "histogram": [
            {
                "insert_size": 54,
                "All_Reads.fr_count": 3
            },
            {
                "insert_size": 170,
                "All_Reads.fr_count": 3
            },(...)

Converting to HTML

Another stylesheet convert the XML to HTML. It also produces the javascript code to display the histograms using Google chart:
$ xsltproc picardmetrics2html.xsl metrics.xml > output.html


That's it,
Pierre

25 January 2013

Samtools tview as a library to display the BAM

I've forked samtools and modified the code of tview to use it as a library to display the alignments. The original code is an inreractive interface using the ncurses library. I've modified the original code and changed the structure of the C 'struct tview' with a few callbacks to make it more object-oriented:

(...)
typedef struct AbstractTview {
 int mrow, mcol;
   (...)
    khash_t(kh_rg) *rg_hash;
    /* callbacks */
    void (*my_destroy)(struct AbstractTview* );
    void (*my_mvprintw)(struct AbstractTview* ,int,int,const char*,...);
    void (*my_mvaddch)(struct AbstractTview*,int,int,int);
    void (*my_attron)(struct AbstractTview*,int);
    void (*my_attroff)(struct AbstractTview*,int);
    void (*my_clear)(struct AbstractTview*);
    int (*my_colorpair)(struct AbstractTview*,int);
    int (*my_drawaln)(struct AbstractTview*,int,int);
    int (*my_loop)(struct AbstractTview*);
    int (*my_underline)(struct AbstractTview*);
} tview_t;
With those callbacks, there is a strong separation between the 'view' and the 'model'. Tview can now be used as a library and it's now easy to create any kind of view you need by extending the 'struct tview_t'. I've created two new interfaces: one for HTML:


samtools tview -d H examples/sorted.bam  -p seq1:20 
seq1:20
 21        31        41        51        61        71        81                 
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ATGTGTGGTTTAACTCGTCCATGGCCCAGCATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTGGGGGCCGCAGTGGSTG
ATGTGTGGTTTAACTCGTACATGGCCCAGCATTAGGGAGCTGTGGACCCCG  GCCTGGCTGTGGGGGCACCAGCCGCTG
ATGTGTGGTTTAACTCGTACATGGCCCAGCATTAGGGAGCTGTGGACCCCG  GCCTGGCTGTGGGGGCACCAGCCGCTG
ATGTGTGGTTTAACTCGTACATGGCCCAGCATTAGGGAGCTGTGGACCCCG  GCCTGGCTGTGGGGGCACCAGCCGCTG
ATGTGTGGTTTAACTCGTCCATGGCCCAGCATTAGGGCGCTGTGGACCCTGC       CTGTGGGGGCCGCAGTGGCTG
ATGTGTGGTTTAACTCGT     GCCCAGCATTAGGGAGCTGTGGACCCTGCAGCCTG CTGTGGGGGCCGCAGTGGCTG
ATGTGTGGTTTAACTCGT     GCCCAGCATTAGGGAGCTGTGGACCCTGCAGCCTG CTGTGGGGGCCGCAGTGGGTG
ATGTGTGGTTTAACTCGTCC   GCCCAGCATTAGGGAGCTGTGGACCCTGCAGCCTG CTGTGGGGGCCGCAGTGGCTG
ATGTGTGGTTTAACTCGTCC         CATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTTG          GGCTG
ATGTGTGGTTTAACTCGTCC         CATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTTG          GGCTG
TTTTTTGTTTTAACTCTTCTCT       CATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTTG          GGCTG
TTTTTTGTTTTAACTCTTCTCT        ATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGGGG               
TTTTTTGTTTTAACTCTTCTCT        ATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGGGG               
ATGTGTGGTTTAACTCGTCCATGG      ATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGGGG               
ATGTGTGGTTTAACTCGTCCATGG       TTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTGGG              
ATGTGTGGTTTAACTCGTCCATGG       TTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTGGG              
ATGTGTGGTTTAACTCGTCCCTGGCCCA   TTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTGGG              
ATGTGTGGTTTAACTCGTCCATGGCCCAG   TAGGGAGCTGTGGACCCTGCAGCCTGGCTGTGGGGG            
ATGTGTGGTTTAACTCGTCCCTGGCCCA    TAGGGAGCTGTGGACCCTGCAGCCTGGCTGTGGGGG            
ATGTGTGGTTTAACTCGTCCATGGCCCAG   TAGGGAGCTGTGGACCCTGCAGCCTGGCTGTGGGGG            
ATGTGTGGTTTAACTCGTCCCTGGCCCA           CTGTGGACCCTGCAGCCTGGCTGTGGGGGGCGCCG      
ATGTGTGGTTTAACTCGTCCATGGCCCAG          CTGTGGACCCTGCAGCCTGGCTGTGGGGGGCGCCG      
ATGTGTGGTTTAACTCGTCCATTGCCCAGC         CTGTGGACCCTGCAGCCTGGCTGTGGGGGGCGCCG      
ATGTGTGGTTTAACTCGTCCATTGCCCAGC          TGTGGACCCTGCAGCCTGGCTGTGGGGGCCGCAGTG    
ATGTGTGGTTTAACTCGTCCATTGCCCAGC          TGTGGACCCTGCAGCCTGGCTGGGGGGGGCGCAGT     
atgtgtggtttaactcgtccatggcccagcatt       TGTGGACCCTGCAGCCTGGCTGTGGGGGCCGCAGTG    
atgtgtggtttaactcgtccatggcccagcatt       TGTGGACCCTGCAGCCTGGCTGGGGGGGGCGCAGT     
atgtgtggtttaactcgtccatggcccagcatt       TGTGGACCCTGCAGCCTGGCTGTGGGGGCCGCAGTG    
  GTGTGGTTTAACTCGTCCATGGCCCAGCATTTGGG   TGTGGACCCTGCAGCCTGGCTGGGGGGGGCGCAGT     
  GTGTGGTTTAACTCGTCCATGGCCCAGCATTAGGG    GTGGACCCTGCAGCCTGGCTGGGGGGGGCACGGGG    
  GTGTGGTTTAACTCGTCCATGGCCCAGCATTTGGG    GTGGACCCTGCAGCCTGGCTGTGGGGGCCGCAGTG    
  GTGTGGTTTAACTCGTCCATGGCCCAGCATTAGGG    GTGGACCCTGCAGCCTGGCTGGGGGGGGCACGGGG    
  GTGTGGTTTAACTCGTCCATGGCCCAGCATTTGGG    GTGGACCCTGCAGCCTGGCTGTGGGGGCCGCAGTG    
  GTGTGGTTTAACTCGTCCATGGCCCAGCATTAGGG    GTGGACCCTGCAGCCTGGCTGGGGGGGGCACGGGG    
    GTGGTTTAACTCGTCCATGGCCCAGCATTAGGGAGC GTGGACCCTGCAGCCTGGCTGTGGGGGCCGCAGTG    
    GTGGTTTAACTCGTCCATGGCCCAGCATTAGGGAGC   GGACCCTGCAGCCTGGCTGTGGGGGCCGCTGTGGG  
    GTGGTTTAACTCGTCCATGGCCCAGCATTAGGGAGC   GGACCCTGCAGCCTGGCTGTGGGGGCCGCTGTGGG  
        TTTAACTCGTCCATGGCCCAGCATTAGGGATCTGT    CCTGCAGCCTGGCTGTGGGGGCCGCAGCGGGTG
        TTTAACTCGTCCATGGCCCAGCATTAGGGATCTGT    CCTGCAGCCTGGCTGTGGGGGCCGCAGCGGGTG
        TTTAACTCGTCCATGGCCCAGCATTAGGGATCTGT    CCTGCAGCCTGGCTGTGGGGGCCGCAGCGGGTG
         TTAACTCGTCCATGGCCCAGCATTAGGGAGCTGTG      GCAGCCTGGCTGTGGGGGCCGCAGTGGCTG
         TTAACTCGTCCATGGCCCAGCATTAGGGAGCTGTG      GCAGCCTGGCTGTGGGGGCCGCAGTGGCTG
         TTAACTCGTCCATGGCCCAGCATTAGGGAGCTGTG      GCAGCCTGGCTGTGGGGGCCGCAGTGGCTG
           AACTCGTCCATGGCCCAGCATTAGGGAGCTGTGGA             CTGTGGGGGCCGCAGTGGGTG
           AACTCGTCCATGGCCCAGCATTAGGGAGCTGTGGA             CTGTGGGGGCCGCAGTGGCTG
           AACTCGTCCATGGCCCAGCATTAGGGAGCTGTGGA             CTGTGGGGGCCGCAGTGGCTG
                 TCCATGGCCCAGCATTAGGGCGCTGTGGACCCTGC       CTGTGGGGGCCGCAGTGGGTG
                 TCCATGGCCCAGCATTAGGGCGCTGTGGACCCTGC       CTGTGGGGGCCGCAGTGGCTG
                                           GGACCCTGCAGCCTGGCTGTGGGGGCCGCTGTGGG  
                                                            TGTGGGGGCCGCAGTGGCTG
                                                            TGTGGGGGCCGCAGTGGCTG
                                                            TGTGGGGGCCGCAGTGGCTG

And another one for TEXT (with colors on a terminal):

samtools tview -d t examples/sorted.bam  -p seq1:23

        31        41        51        61        71        81        91          
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
TGTGGTTTAACTCGTCCATGGCCCAGCATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTGGGGGCCGCAGTGGSTGAGG
TGTGGTTTAACTCGTACATGGCCCAGCATTAGGGAGCTGTGGACCCCG  GCCTGGCTGTGGGGGCACCAGCCGCTGCGG
TGTGGTTTAACTCGTACATGGCCCAGCATTAGGGAGCTGTGGACCCCG  GCCTGGCTGTGGGGGCACCAGCCGCTGCGG
TGTGGTTTAACTCGTACATGGCCCAGCATTAGGGAGCTGTGGACCCCG  GCCTGGCTGTGGGGGCACCAGCCGCTGCGG
TGTGGTTTAACTCGTCCATGGCCCAGCATTAGGGCGCTGTGGACCCTGC       CTGTGGGGGCCGCAGTGGCTGAGG
TGTGGTTTAACTCGT     GCCCAGCATTAGGGAGCTGTGGACCCTGCAGCCTG CTGTGGGGGCCGCAGTGGCTGAGG
TGTGGTTTAACTCGT     GCCCAGCATTAGGGAGCTGTGGACCCTGCAGCCTG CTGTGGGGGCCGCAGTGGGTGAGG
TGTGGTTTAACTCGTCC   GCCCAGCATTAGGGAGCTGTGGACCCTGCAGCCTG CTGTGGGGGCCGCAGTGGCTGAGG
TGTGGTTTAACTCGTCC         CATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTTG          GGCTGAGG
TGTGGTTTAACTCGTCC         CATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTTG          GGCTGAGG
TTTGTTTTAACTCTTCTCT       CATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGTTG          GGCTGAGG
TTTGTTTTAACTCTTCTCT        ATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGGGG               AGG
TTTGTTTTAACTCTTCTCT        ATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGGGG               AGG
TGTGGTTTAACTCGTCCATGG      ATTAGGGAGCTGTGGACCCTGCAGCCTGGCTGGGG               AGG

The code is available on github:


That's it,

Pierre