23 April 2013

playing with BWA-MEM : my notebook

BWA-MEM, is a new tool that is part of the latest version of BWA. As said Heng Li on Biostar bwa-mem can be used to identify the viral integration sites in the human genome. Here I've used various short reads to explore how bwa maps the pairs. The sequences I used below are:

  • NOTCH2a and NOTCH2b two sequences on the chromosomes 1, on the same strand
  • NOTCH2del : same as NOTCH2a but with a deletion
  • ROXAN1a and ROXAN1b: two sequences on the chromosome 22, on the same strand
  • NOTCH2ins is NOTCH2a with and insertion of ROXAN1a
The following results were generated with the makefile below. Each pair of fastq was submitted twice (1_* and 2_*). I also added some pairs of reads from another source of FASTQs to let bwa infer the size of the fragments (see biostar: http://www.biostars.org/p/69694).

NOTCH2a vs NOTCH2b BWA-MEM options: none

NOTCH2a vs NOTCH2b
BWA-MEM options: none
NOT Properly mapped reads because they're on the same strand.
ReadFlagChromPosqualCigarExtra
1_Notch2p1=651120572000950MNM:i:0 AS:i:50 XS:i:45
1_Notch2p2=1291120572200950MNM:i:0 AS:i:50 XS:i:45
2_Notch2p11120572000950MNM:i:0 AS:i:50 XS:i:45
2_Notch2p21120572200950MNM:i:0 AS:i:50 XS:i:45

NOTCH2a vs revcomp(NOTCH2b) BWA-MEM options: none

NOTCH2a vs revcomp(NOTCH2b)
BWA-MEM options: none
Properly mapped reads.
ReadFlagChromPosqualCigarExtra
1_Notch2pPR1=9911205720003950MNM:i:0 AS:i:50 XS:i:45
1_Notch2pPr2=14711205722003950MNM:i:0 AS:i:50 XS:i:45
2_Notch2pPR111205720003950MNM:i:0 AS:i:50 XS:i:45
2_Notch2pPr211205722003950MNM:i:0 AS:i:50 XS:i:45

NOTCH2del vs revcomp(NOTCH2b) BWA-MEM options: none

NOTCH2del vs revcomp(NOTCH2b)
BWA-MEM options: none
Deletion in Read1
ReadFlagChromPosqualCigarExtra
1_Notch2delpPR1=9911205720003937M20D31MNM:i:20 AS:i:42 XS:i:37
1_Notch2delpPr2=14711205722003950MNM:i:0 AS:i:50 XS:i:45
2_Notch2delpPR111205720003937M20D31MNM:i:20 AS:i:42 XS:i:37
2_Notch2delpPr211205722003950MNM:i:0 AS:i:50 XS:i:45

NOTCH2ins vs revcomp(NOTCH2b) BWA-MEM options:

NOTCH2ins vs revcomp(NOTCH2b)
BWA-MEM options: non
Soft clipping of chr1: two hits on different chromosomes.
ReadFlagChromPosqualCigarExtra
1_NOTCH2insRoxanpR1=9722417215666023S50M27SNM:i:0 AS:i:50 XS:i:0
1_NOTCH2insRoxanpr2=1451120572200950MNM:i:0 AS:i:50 XS:i:45
2_NOTCH2insRoxanpR122417215666023S50M27SNM:i:0 AS:i:50 XS:i:0
2_NOTCH2insRoxanpr21120572200950MNM:i:0 AS:i:50 XS:i:45


NOTCH2a+ROXAN1a vs revcomp(NOTCH2b) BWA-MEM options: none

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b)
BWA-MEM options: none
two hits on chromosome chr1 and chr22 for the 5' seq
nevertheless, properly paired was set.
ReadFlagChromPosqualCigarExtra
1_Notch2RoxanpPR1=991120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
1_Notch2RoxanpPR1=992241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
1_Notch2RoxanpPr2=1471120572200950MNM:i:0 AS:i:50 XS:i:45
2_Notch2RoxanpPR11120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
2_Notch2RoxanpPR12241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
2_Notch2RoxanpPr21120572200950MNM:i:0 AS:i:50 XS:i:45

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b+ROXAN1b) BWA-MEM options: none

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b+ROXAN1b)
BWA-MEM options: none
All fragments on chr1/chr22.
Reads are NOT properly paired.
ReadFlagChromPosqualCigarExtra
1_Notch2RoxanpR1=971120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
1_Notch2RoxanpR1=972241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
1_Notch2Roxanpr2=14522417217666050S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,-120572200,50M50S,9,0;
1_Notch2Roxanpr2=1451120572200950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,-41721766,50S50M,60,0;
2_Notch2RoxanpR11120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
2_Notch2RoxanpR12241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
2_Notch2Roxanpr222417217666050S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,-120572200,50M50S,9,0;
2_Notch2Roxanpr21120572200950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,-41721766,50S50M,60,0;

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b) BWA-MEM options: -C

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b)
BWA-MEM options: -C
I cannot see the effect of the option '-C':
"append FASTA/FASTQ comment to SAM output"
ReadFlagChromPosqualCigarExtra
1_Notch2RoxanpPR1=991120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
1_Notch2RoxanpPR1=992241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
1_Notch2RoxanpPr2=1471120572200950MNM:i:0 AS:i:50 XS:i:45
2_Notch2RoxanpPR11120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
2_Notch2RoxanpPR12241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
2_Notch2RoxanpPr21120572200950MNM:i:0 AS:i:50 XS:i:45

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b) BWA-MEM options: -M

NOTCH2a+ROXAN1a vs revcomp(NOTCH2b)
BWA-MEM options: -M
Set the duplicate flags for the multiple hits.
ReadFlagChromPosqualCigarExtra
1_Notch2RoxanpPR1=991120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
1_Notch2RoxanpPR1s=3552241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
1_Notch2RoxanpPr21120572200950MNM:i:0 AS:i:50 XS:i:45
2_Notch2RoxanpPR11120572000950M50SNM:i:0 AS:i:50 XS:i:45 XP:Z:22,+41721566,50S50M,9,0;
2_Notch2RoxanpPR1s2241721566950S50MNM:i:0 AS:i:50 XS:i:0 XP:Z:1,+120572000,50M50S,9,0;
2_Notch2RoxanpPr21120572200950MNM:i:0 AS:i:50 XS:i:45

NOTCH2ins vs revcomp(ROXAN1b) BWA-MEM options: none

NOTCH2ins vs revcomp(ROXAN1b)
BWA-MEM options: none
Reads are mapped on chr22.
ReadFlagChromPosqualCigarExtra
1_NOTCH2insRoxanpPR122417215666023S50M27SNM:i:0 AS:i:50 XS:i:0
1_NOTCH2insRoxanpPr222417217666050MNM:i:0 AS:i:50 XS:i:0
2_NOTCH2insRoxanpPR122417215666023S50M27SNM:i:0 AS:i:50 XS:i:0
2_NOTCH2insRoxanpPr222417217666050MNM:i:0 AS:i:50 XS:i:0

That's it,

Pierre

The makefile

.PHONY=all fastqs
REF=/path/to/human_g1k_v37.fasta
SAMTOOLS=/path/to/samtools-0.1.19
BWA=/path/to/bwa-0.7.3a/bwa
ANOTHERFASTQ=/path/to/NA12878
HTML=result
define complement
tr "atgcATGC" "TACGTACG"
endef
define revcomp
$(call complement) | rev
endef
define makefastq
echo -e "@$(1)\n$(2)\n+"; echo $(2)| tr "ATGCN" "A"
endef
define fastqfile
($(foreach N,1 2,$(call makefastq,$(N)_$(1),$(2));)) > $(3)
endef
define fastqfiles
$(call fastqfile,$(1)/1,$(2),$(4)_1.fq)
$(call fastqfile,$(1)/2,$(3),$(4)_2.fq)
endef
define callbwa
$(call fastqfiles,$(1),$(2),$(3),$(4))
gunzip -c ${ANOTHERFASTQ}_1.fastq.gz | head -n 400 >> $(4)_1.fq
gunzip -c ${ANOTHERFASTQ}_2.fastq.gz | head -n 400 >> $(4)_2.fq
$(BWA) mem $(5) $(REF) $(4)_1.fq $(4)_2.fq 2> /dev/null | $(SAMTOOLS)/samtools view -SX - | grep -v -E "^@SQ" > $(4).sam
echo "<div><h3>${6} BWA-MEM options: ${5}</h3><table border='1' style='border:1px solid #000000;'>" >> $(HTML).html
echo "<caption>${6}<br/>BWA-MEM options: ${5}</caption>" >> $(HTML).html
echo "<tr><th>Read</th><th>Flag</th><th>Chrom</th><th>Pos</th><th>qual</th><th>Cigar</th><th>Extra</th></tr>\n" >> $(HTML).html
@cat $(4).sam | grep -E '^[12]_' | awk -F ' ' '{printf("<tr><td>%s</td><td>%s</td><td>%s</td><td>%s</td><td>%s</td><td>%s</td><td>",$$1,$$2,$$3,$$4,$$5,$$6);for(i=12;i<=NF;i++) printf("%s ",$$i); printf("</td></tr>\n");}' >> $(HTML).html
echo "</table></div>" >> $(HTML).html
endef
define begintable
endef
define endtable
endef
NOTCH2a=GTCTTGCTCTGTCACCCAGGCTAGAGTGCAATGATGTGGTGTCAGCTCAC
NOTCH2b=CTGACCTCGTGATTCGCCCAACACAGCCTCCTGAAGTGCTGGGATTACCA
NOTCH2del=GTCTTGCTCTGTCACCCAGGCTAGAGTGCAATGATGTTCCGCCTCCTGGGTTCAAGTCACTCTCCTGC
ROXAN1a=AGGTCGACACTACCCCTAAAGCAAGAAGAATATGAGGTGAGTGTCAGCTG
ROXAN1b=CAATGATCTGTTCCGGGAGAAGGACTATAAGCAGGCTCTGGTGCAGTACA
NOTCH2ins=GTCTTGCTCTGTCACCCAGGCTA${ROXAN1a}GAGTGCAATGATGTGGTGTCAGCTCAC
all:
rm -f $(HTML).html
$(call callbwa,Notch2,$(NOTCH2a),$(NOTCH2b),tmp,,NOTCH2a vs NOTCH2b)
$(call callbwa,Notch2,$(NOTCH2a),`echo $(NOTCH2b)| $(call revcomp)`,tmp,,NOTCH2a vs revcomp(NOTCH2b))
$(call callbwa,Notch2del,$(NOTCH2del),`echo $(NOTCH2b)| $(call revcomp)`,tmp,,NOTCH2del vs revcomp(NOTCH2b))
$(call callbwa,Notch2del,$(NOTCH2del),`echo $(NOTCH2b)| $(call revcomp)`,tmp,-M,NOTCH2del vs revcomp(NOTCH2b))
$(call callbwa,NOTCH2insRoxan,$(NOTCH2ins),`echo $(NOTCH2b)| $(call revcomp)`,tmp,,NOTCH2ins vs revcomp(NOTCH2b))
$(call callbwa,Notch2Roxan,$(NOTCH2a)$(ROXAN1a),`echo $(NOTCH2b)| $(call revcomp)`,tmp,,NOTCH2a+ROXAN1a vs revcomp(NOTCH2b))
$(call callbwa,Notch2Roxan,$(NOTCH2a)$(ROXAN1a),`echo $(NOTCH2b)$(ROXAN1b)| $(call revcomp)`,tmp,,NOTCH2a+ROXAN1a vs revcomp(NOTCH2b+ROXAN1b))
$(call callbwa,Notch2Roxan,$(NOTCH2a)$(ROXAN1a),`echo $(NOTCH2b)| $(call revcomp)`,tmp,-C,NOTCH2a+ROXAN1a vs revcomp(NOTCH2b))
$(call callbwa,Notch2Roxan,$(NOTCH2a)$(ROXAN1a),`echo $(NOTCH2b)| $(call revcomp)`,tmp,-M,NOTCH2a+ROXAN1a vs revcomp(NOTCH2b))
$(call callbwa,NOTCH2insRoxan,$(NOTCH2ins),`echo $(ROXAN1b)| $(call revcomp)`,tmp,,NOTCH2ins vs revcomp(ROXAN1b))
cat $(HTML).html
view raw Makefile hosted with ❤ by GitHub

No comments: