04 March 2016

Now in picard: two javascript-based tools filtering BAM and VCF files.


SamJS and VCFFilterJS are two tools I wrote for jvarkit. Both tools use the embedded java javascript engine to filter BAM or VCF file.
To get a broader audience, I've copied those functionalities to Picard in 'FilterSamReads' and 'FilterVcf'.

FilterSamReads

FilterSamReads filters a SAM or BAM file with a javascript expression using the java javascript-engine.
The script puts the following variables in the script context: 'record' a SamRecord and 'header' a SAMFileHeader. Last value of the script should be a boolean to tell wether we should accept or reject the record.

The script samFilter.js

/** accept record if second base of DNA is a A */
function accept(r)
 {
 return r.getReadString().length()>2 &&
  r.getReadString().substring(1,2)=="A";
 }

accept(record);

Invoke and output

$ java -jar picard-tools-2.1.1/picard.jar \
 FilterSamReads I=in.sam  O=out.sam \
 JS=samFilter.js FILTER=includeJavascript
 
$ cat out.sam | cut -f 10 | cut -c 2 | sort | uniq

A

FilterVcf

FilterVcf one or more hard filters to a VCF file to filter out genotypes and variants.
Filters a VCF file with a javascript expression interpreted by the java javascript engine. The script puts the following variables in the script context: 'variant' a VariantContext and 'header' a VCFHeader. Last value of the script should be a boolean to tell wether we should accept or reject the record.

The script variantFilter.js

/** prints a VARIATION if two samples at least have a DP>100 */ 
function myfilterFunction(thevariant)
    {
    var samples=header.genotypeSamples;
    var countOkDp=0;

    for(var i=0; i< samples.size();++i)
        {
        var sampleName=samples.get(i);
        if(! variant.hasGenotype(sampleName)) continue;
        var genotype = thevariant.genotypes.get(sampleName);
        if( ! genotype.hasDP()) continue;
        var dp= genotype.getDP();
        if(dp > 100 ) countOkDp++;
        }
    return (countOkDp>2)
    }
myfilterFunction(variant)

Invoke and output

java -jar picard-tools-2.1.1/picard.jar FilterVcf \
 I=in.vcf O=out.vcf \
 JS=variantFilter.js
 
$ grep -v '#' jeter.vcf | cut -f 7 | grep variantFilter | wc -l
23


That's it,
Pierre

1 comment:

  1. What am I missing

    variant.getLog10PError() > 100.0

    or

    variant.getLog10PError() < 100.0

    These don't seem to filter as I would expect.

    ReplyDelete