03 February 2011

Testing if a BAM file is sorted using the samtools C API.

"Is my BAM file sorted ? is there any tool (in samtools ?) to quickly know whether a BAM file has been sorted or not ? should I run `samtools sort` ?": this is the question I asked yesterday on BioStar.
Brad Chapman suggested to use 'samtools index' as an unsorted BAM would return an error.
On my side I wrote the following 'C' program using the Samtools API:

This program opens one or more BAM file , loop over all the reads and test they are sorted on chromosome/position. The program returns 'EXIT_SUCCESS (0)' if all the files were sorted, or EXIT_FAILURE.

Compilation


FLAG64= -m64
gcc ${FLAG64} -o bamsorted -O3 -I ${SAMDIR} -L ${SAMDIR} bamsorted.c -lbam -lz

Test


Using samtools index

> for F in `find . -name "*.bam"`; do echo $F; samtools index ${F} ; done
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam
GALAXY/galaxy-dist/test-data/srma_out1.bam
GALAXY/galaxy-dist/test-data/srma_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam
GALAXY/galaxy-dist/test-data/1.bam
GALAXY/galaxy-dist/test-data/srma_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam
GALAXY/galaxy-dist/test-data/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
GALAXY/galaxy-dist/test-data/srma_in3.bam
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr

Using my C code

> for F in `find . -name "*.bam"`; do ./bamsorted ${F} ; done
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam OK
GALAXY/galaxy-dist/test-data/srma_out1.bam OK
GALAXY/galaxy-dist/test-data/srma_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam OK
GALAXY/galaxy-dist/test-data/1.bam OK
GALAXY/galaxy-dist/test-data/srma_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam OK
GALAXY/galaxy-dist/test-data/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
GALAXY/galaxy-dist/test-data/srma_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.

That's it,

Pierre

4 comments:

Anthony said...

Thank you Pierre. Very useful !

Jina said...

Hi Pierre,
I am confused. Does our script double check the bam files being sorted? Because I didn't get any error using "samtools index". Could it not be sorted?

Pierre Lindenbaum said...

It should only print a message if the reads aren't sorted.

Boxiang Liu said...

very useful script. Thank you!