Brad Chapman suggested to use
'samtools index'
as an unsorted BAM would return an error.On my side I wrote the following 'C' program using the Samtools API:This program opens one or more BAM file , loop over all the reads and test they are sorted on chromosome/position. The program returns 'EXIT_SUCCESS (0)' if all the files were sorted, or EXIT_FAILURE.
Compilation
FLAG64= -m64
gcc ${FLAG64} -o bamsorted -O3 -I ${SAMDIR} -L ${SAMDIR} bamsorted.c -lbam -lz
gcc ${FLAG64} -o bamsorted -O3 -I ${SAMDIR} -L ${SAMDIR} bamsorted.c -lbam -lz
Test
Using samtools index
> for F in `find . -name "*.bam"`; do echo $F; samtools index ${F} ; done
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam
GALAXY/galaxy-dist/test-data/srma_out1.bam
GALAXY/galaxy-dist/test-data/srma_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam
GALAXY/galaxy-dist/test-data/1.bam
GALAXY/galaxy-dist/test-data/srma_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam
GALAXY/galaxy-dist/test-data/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
GALAXY/galaxy-dist/test-data/srma_in3.bam
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam
GALAXY/galaxy-dist/test-data/srma_out1.bam
GALAXY/galaxy-dist/test-data/srma_out2.bam
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam
GALAXY/galaxy-dist/test-data/1.bam
GALAXY/galaxy-dist/test-data/srma_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam
GALAXY/galaxy-dist/test-data/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
GALAXY/galaxy-dist/test-data/srma_in3.bam
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam
[bam_index_core] the alignment is not sorted (HWI-EAS91_1_30788AAXX:1:1:799:192): 14955 > 8420 in 46-th chr
Using my C code
> for F in `find . -name "*.bam"`; do ./bamsorted ${F} ; done
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam OK
GALAXY/galaxy-dist/test-data/srma_out1.bam OK
GALAXY/galaxy-dist/test-data/srma_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam OK
GALAXY/galaxy-dist/test-data/1.bam OK
GALAXY/galaxy-dist/test-data/srma_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam OK
GALAXY/galaxy-dist/test-data/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
GALAXY/galaxy-dist/test-data/srma_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
GALAXY/galaxy-dist/test-data/sam_merge_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in1.bam OK
GALAXY/galaxy-dist/test-data/srma_out1.bam OK
GALAXY/galaxy-dist/test-data/srma_out2.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_merge_in2.bam OK
GALAXY/galaxy-dist/test-data/1.bam OK
GALAXY/galaxy-dist/test-data/srma_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out1.bam OK
GALAXY/galaxy-dist/test-data/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
GALAXY/galaxy-dist/test-data/srma_in3.bam OK
GALAXY/galaxy-dist/test-data/sam_pileup_in1.bam OK
GALAXY/galaxy-dist/test-data/sam_to_bam_out2.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/1.bam OK
GALAXY/galaxy-dist/lib/galaxy/datatypes/test/3.bam Unsorted: On Reference[45]="chrM" position=8420 after 14955.
That's it,
Pierre
Thank you Pierre. Very useful !
ReplyDeleteHi Pierre,
ReplyDeleteI am confused. Does our script double check the bam files being sorted? Because I didn't get any error using "samtools index". Could it not be sorted?
It should only print a message if the reads aren't sorted.
ReplyDeletevery useful script. Thank you!
ReplyDelete