Making use of Picard Metrics files using XML and XSLT. #ngs
Many tools in the Picard package produce some "Metrics File" (described at http://picard.sourceforge.net/picard-metric-definitions.shtml). The picard API contains a java parser "MetricsFile" parsing those metrics-file:
MetricsFile<MetricBase, Comparable<?>> metricsFile=new MetricsFile<MetricBase, Comparable<?>>(); metricsFile.read(new FileReader("metrics.txt"));In order produce some custom reports from those files, I've created a tool that dump the content of the MetricsFile as a XML file. The source code is available at: http://code.google.com/p/jvarkit/source/browse/trunk/src/main/java/fr/inserm/umr1087/jvarkit/tools/picard/metrics2xml/PicardMetricsToXML.java.
Compilation
$ mkdir tmp $ javac -d tmp -cp /path/to/picard.jar:/path/to/sam.jar \ -sourcepath src/main/java \ src/main/java/fr/inserm/umr1087/jvarkit/tools/picard/metrics2xml/PicardMetricsToXML.java $ jar vcf picardmetrics2xml.jar -C tmp .
Usage
Say you have used the tool 'CollectInsertSizeMetrics.jar' from picard:$ java -jar/path/to/CollectInsertSizeMetrics.jar \ O=out.metrics \ I=/path/to/samtools/examples/sorted.bam \ AS=true \ R=/path/to/samtools/ex1.fa \ H=chart.pdfThe file out.metrics looks like this:
## net.sf.picard.metrics.StringHeader # net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=(...) ## net.sf.picard.metrics.StringHeader # Started on: Tue Feb 05 12:51:30 CET 2013 ## METRICS CLASS net.sf.picard.analysis.InsertSizeMetrics MEDIAN_INSERT_SIZE MEDIAN_ABSOLUTE_DEVIATION MIN_INSERT_SIZE MAX_INSERT_SIZE MEAN_INSERT_SIZE STANDARD_DEVIATION READ_PAIRS 209 10 54 243 208.857506 13.614603 4716 FR 5 9 13 17 21 25 29 35 43 ## HISTOGRAM java.lang.Integer insert_size All_Reads.fr_count 54 3 170 3 173 9 174 3 175 3 177 6 (...)This file can be converted to XML using the following command:
$ java -cp /path/to/picard.jar:/path/to/sam.jar:picardmetrics2xml.jar file.metrics <?xml version="1.0" encoding="UTF-8"?><picard-metrics xmlns="http://picard.sourc eforge.net/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><metrics-file file="file.metrics"><headers><header class="net.sf.picard.metrics.StringHeader" >net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=jeter2 INPUT=/ho me/lindenb/package/samtools-0.1.18/examples/sorted.bam OUTPUT=jeter REFERENCE_SE QUENCE=/home/lindenb/package/samtools-0.1.18/examples/ex1.fa ASSUME_SORTED=true DEVIATIONS=10.0 MINIMUM_PCT=0.05 METRIC_ACCUMULATION_LEVEL=[ALL_READS] STOP_A FTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL =5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false</header><h eader class="net.sf.picard.metrics.StringHeader">Started on: Tue Feb 05 12:51:30 CET 2013</header></headers><metrics><thead class="net.sf.picard.analysis.Insert SizeMetrics"><th class="double">MEDIAN_INSERT_SIZE</th><th class="double">MEDIAN _ABSOLUTE_DEVIATION</th><th class="int">MIN_INSERT_SIZE</th><th class="int">MAX_ INSERT_SIZE</th><th class="double">MEAN_INSERT_SIZE</th><th class="double">STAND ARD_DEVIATION</th><th class="long">READ_PAIRS</th><th class="net.sf.picard.sam.S amPairUtil$PairOrientation">PAIR_ORIENTATION</th><th class="int">WIDTH_OF_10_PER CENT</th><th class="int">WIDTH_OF_20_PERCENT</th><th class="int">WIDTH_OF_30_PER CENT</th><th class="int">WIDTH_OF_40_PERCENT</th><th class="int">WIDTH_OF_50_PER CENT</th><th class="int">WIDTH_OF_60_PERCENT</th><th class="int">WIDTH_OF_70_PER CENT</th><th class="int">WIDTH_OF_80_PERCENT</th><th class="int">WIDTH_OF_90_PER CENT</th><th class="int">WIDTH_OF_99_PERCENT</th><th class="java.lang.String">SA MPLE</th><th class="java.lang.String">LIBRARY</th><th class="java.lang.String">R EAD_GROUP</th></thead><tbody><tr><td>209.0</td><td>10.0</td><td>54</td><td>243</ td><td>208.857506</td><td>13.614603</td><td>4716</td><td>FR</td><td>5</td><td>9< /td><td>13</td><td>17</td><td>21</td><td>25</td><td>29</td><td>35</td><td>43</td ><td>65</td><td xsi:nil="true"/><td xsi:nil="true"/><td xsi:nil="true"/></tr></t body></metrics><histogram class="java.lang.Integer"><thead><th>insert_size</th>< th>All_Reads.fr_count</th></thead><tbody><tr><td>54</td><td>3.0</td></tr><tr><td >170</td><td>3.0</td></tr><tr><td>173</td><td>9.0</td></tr><tr><td>174</td><td>3 .0</td></tr><tr><td>175</td><td>3.0</td></tr><tr><td>177</td><td>6.0</td></tr><t r><td>178</td><td>6.0</td></tr><tr><td>179</td><td>9.0</td></tr><tr><td>180</td> <td>6.0</td></tr><tr><td>181</td><td>6.0</td></tr><tr><td>182</td><td>21.0</td>< /tr><tr><td>183</td><td>9.0</td></tr><tr><td>184</td><td>15.0</td></tr><tr><td>1 85</td><td>33.0</td></tr><tr><td>186</td><td>15.0</td></tr><tr><td>187</td><td>3 (...)
Converting to JSON
Now, we can convert the XML to whatever we want using XSLT. I wrote a stylesheet picardmetrics2json.xsl converting the XML to JSON (though, I should escape the quotes in the strings ).$ xsltproc picardmetrics2json.xsl metrics.xml { "metrics.xml": { "headers": [ { "class": "net.sf.picard.metrics.StringHeader", "value": "net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=metrics.pdf INPUT=samtools-0.1.18/examples/sorted.bam OUTPUT=metrics.txt REFERENCE_SEQUENCE=/home/lindenb/package/samtools-0.1.18/examples/ex1.fa ASSUME_SORTED=true DEVIATIONS=10.0 MINIMUM_PCT=0.05 METRIC_ACCUMULATION_LEVEL=[ALL_READS] STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false" }, { "class": "net.sf.picard.metrics.StringHeader", "value": "Started on: Tue Feb 05 12:51:30 CET 2013" } ], "metrics": [ { "MEDIAN_INSERT_SIZE": 209, "MEDIAN_ABSOLUTE_DEVIATION": 10, "MIN_INSERT_SIZE": 54, "MAX_INSERT_SIZE": 243, "MEAN_INSERT_SIZE": 208.857506, "STANDARD_DEVIATION": 13.614603, "READ_PAIRS": 4716, "PAIR_ORIENTATION": "FR", "WIDTH_OF_10_PERCENT": 5, "WIDTH_OF_20_PERCENT": 9, "WIDTH_OF_30_PERCENT": 13, "WIDTH_OF_40_PERCENT": 17, "WIDTH_OF_50_PERCENT": 21, "WIDTH_OF_60_PERCENT": 25, "WIDTH_OF_70_PERCENT": 29, "WIDTH_OF_80_PERCENT": 35, "WIDTH_OF_90_PERCENT": 43, "WIDTH_OF_99_PERCENT": 65, "SAMPLE": null, "LIBRARY": null, "READ_GROUP": null } ], "histogram": [ { "insert_size": 54, "All_Reads.fr_count": 3 }, { "insert_size": 170, "All_Reads.fr_count": 3 },(...)
Converting to HTML
Another stylesheet convert the XML to HTML. It also produces the javascript code to display the histograms using Google chart:$ xsltproc picardmetrics2html.xsl metrics.xml > output.html
That's it,
Pierre
No comments:
Post a Comment