How I start a bioinformatics project
Phil Ashton tweeted a link to a paper about how to set up a bioinformatics project file hierarchy: " A Quick Guide to Organizing Computational Biology Projects ".
Nick Loman posted his version yesterday : "How I start a bioinformatics project" on http://nickloman.github.io/2014/05/14/how-i-start-a-bioinformatics-project/.
Here is mine (simplified):
- I start by creating a directory managed by git
- I create a JSON-based description of my data, including the path to the softwares, to the references This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
{ "reference": { "name": "ref", "fasta": "/path/to/ref.fasta" }, "samples": [ { "fastq": [ "path/to/Sample1/Sample1_1.fq.gz", "path/to/Sample1/Sample1_2.fq.gz" ], "name": "Sample1" }, { "fastq": [ "path/to/Sample2/Sample2_1.fq.gz", "path/to/Sample2/Sample2_2.fq.gz" ], "name": "Sample2" }, { "fastq": [ "path/to/Sample3/Sample3_1.fq.gz", "path/to/Sample3/Sample3_2.fq.gz" ], "name": "Sample3" } ], "tools": { "bcftools": "/path/to/bcftools", "bwa": "/path/to/bwa", "samtools": "/path/to/samtools" } } - I create a git submodule for a project hosting an Apache-velocity template transforming a Makefile from config.json :This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
REF=${config.reference.fasta} .PHONY:all all: align/variants.vcf align/variants.vcf: #foreach($sample in ${config.samples}) align/${sample.name}_sorted.bam #end ${config.tools.samtools} mpileup -uf ${REF} $^ |\ ${config.tools.bcftools} view -vcg - >$@ #foreach($sample in ${config.samples}) align/${sample.name}_sorted.bam : ${sample.fastq[0]} ${sample.fastq[1]} mkdir -p $(dir $@) && \ ${config.tools.bwa} mem -R '@RG\tID:${sample.getId()}\tSM:${sample.name}' ${REF} $^ |\ ${config.tools.samtools} view -b -S - |\ ${config.tools.samtools} sort - $(basename $@) && \ ${config.tools.samtools} index $@ #end - The Makefile is generated using jsvelocity :This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
java -jar jsvelocity.jar -f config config.json make.vm > Makefile This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersREF=/path/to/ref.fasta .PHONY:all all: align/variants.vcf align/variants.vcf: align/Sample1_sorted.bam align/Sample2_sorted.bam align/Sample3_sorted.bam /path/to/samtools mpileup -uf ${REF} $^ |\ /path/to/bcftools view -vcg - >$@ align/Sample1_sorted.bam : path/to/Sample1/Sample1_1.fq.gz path/to/Sample1/Sample1_2.fq.gz mkdir -p $(dir $@) && \ /path/to/bwa mem -R '@RG\tID:id10\tSM:Sample1' ${REF} $^ |\ /path/to/samtools view -b -S - |\ /path/to/samtools sort - $(basename $@) && \ /path/to/samtools index $@ align/Sample2_sorted.bam : path/to/Sample2/Sample2_1.fq.gz path/to/Sample2/Sample2_2.fq.gz mkdir -p $(dir $@) && \ /path/to/bwa mem -R '@RG\tID:id15\tSM:Sample2' ${REF} $^ |\ /path/to/samtools view -b -S - |\ /path/to/samtools sort - $(basename $@) && \ /path/to/samtools index $@ align/Sample3_sorted.bam : path/to/Sample3/Sample3_1.fq.gz path/to/Sample3/Sample3_2.fq.gz mkdir -p $(dir $@) && \ /path/to/bwa mem -R '@RG\tID:id20\tSM:Sample3' ${REF} $^ |\ /path/to/samtools view -b -S - |\ /path/to/samtools sort - $(basename $@) && \ /path/to/samtools index $@ - The Makefile is invoked with option -j N(Allow N jobs at once) using GNU-Make or QMake(distributed parallel make, scheduled by Sun Grid Engine)
That's it,
Pierre
No comments:
Post a Comment