Testing if a BAM file is sorted using the samtools C API.
"Is my BAM file sorted ? is there any tool (in samtools ?) to quickly know whether a BAM file has been sorted or not ? should I run `samtools sort` ?": this is the question I asked yesterday on BioStar.
Brad Chapman suggested to use 'samtools index'
as an unsorted BAM would return an error.
On my side I wrote the following 'C' program using the Samtools API:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
moved to: https://github.com/lindenb/samtools-utilities/blob/master/src/bamsorted.c |
Compilation
FLAG64= -m64
gcc ${FLAG64} -o bamsorted -O3 -I ${SAMDIR} -L ${SAMDIR} bamsorted.c -lbam -lz
gcc ${FLAG64} -o bamsorted -O3 -I ${SAMDIR} -L ${SAMDIR} bamsorted.c -lbam -lz
Test
Using samtools index
> for F in `find . -name "*.bam"`; do echo $F; samtools index ${F} ; done
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam
GALAXY/galaxy-dist/test-data/srma_out1.bam
GALAXY/galaxy-dist/test-data/srma_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam
GALAXY/galaxy-dist/test-data/1.bam
GALAXY/galaxy-dist/test-data/srma_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam
GALAXY/galaxy-dist/test-data/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
GALAXY/galaxy-dist/test-data/srma_in3.bam
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam
GALAXY/galaxy-dist/test-data/srma_out1.bam
GALAXY/galaxy-dist/test-data/srma_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam
GALAXY/galaxy-dist/test-data/1.bam
GALAXY/galaxy-dist/test-data/srma_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam
GALAXY/galaxy-dist/test-data/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
GALAXY/galaxy-dist/test-data/srma_in3.bam
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
Using my C code
> for F in `find . -name "*.bam"`; do ./bamsorted ${F} ; done
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam OK
GALAXY/galaxy-dist/test-data/srma_out1.bam OK
GALAXY/galaxy-dist/test-data/srma_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam OK
GALAXY/galaxy-dist/test-data/1.bam OK
GALAXY/galaxy-dist/test-data/srma_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam OK
GALAXY/galaxy-dist/test-data/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
GALAXY/galaxy-dist/test-data/srma_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam OK
GALAXY/galaxy-dist/test-data/srma_out1.bam OK
GALAXY/galaxy-dist/test-data/srma_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam OK
GALAXY/galaxy-dist/test-data/1.bam OK
GALAXY/galaxy-dist/test-data/srma_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam OK
GALAXY/galaxy-dist/test-data/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
GALAXY/galaxy-dist/test-data/srma_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
That's it,
Pierre
4 comments:
Thank you Pierre. Very useful !
Hi Pierre,
I am confused. Does our script double check the bam files being sorted? Because I didn't get any error using "samtools index". Could it not be sorted?
It should only print a message if the reads aren't sorted.
very useful script. Thank you!
Post a Comment