Grouping mutations/Gene=f(sample)
GroupByGene is a small C++ tool grouping the data:
- CHROM
- POS
- REF
- GENE
- SAMPLE
Example:
$ cat input.tsv #CHROM POS REF ALT GENE SAMPLE chr1 10 A T gene1 indi1 chr1 10 A T gene1 indi2 chr1 11 C G gene1 indi2 chr2 110 C G gene2 indi3 chr3 210 A T gene3 indi1 chr3 211 C T gene3 indi2 chr3 211 C T gene3 indi3 chr3 215 C G gene3 indi3 chr3 216 C T gene3 indi3 chr4 390 C T gene4 indi1 chr4 390 C A gene4 indi3
Calling "groupbygene:
$ groupbygene --chrom 1 --pos 2 --ref 3 --alt 4 --sample 6 --gene 5 < input.tsv
GENE | CHROM | START | END | count SAMPLES | distinct MUTATIONS | count(indi1) | count(indi2) | count(indi3) |
gene1 | chr1 | 10 | 11 | 2 | 2 | 1 | 2 | 0 |
gene2 | chr2 | 110 | 110 | 1 | 1 | 0 | 0 | 1 |
gene3 | chr3 | 210 | 216 | 3 | 4 | 1 | 1 | 3 |
gene4 | chr4 | 390 | 390 | 2 | 2 | 1 | 0 | 1 |
$ groupbygene --chrom 1 --pos 2 --ref 3 --alt 4 --sample 6 --gene 5 --norefalt < input.tsv
GENE | CHROM | START | END | count SAMPLES | distinct MUTATIONS | count(indi1) | count(indi2) | count(indi3) |
gene1 | chr1 | 10 | 11 | 2 | 2 | 1 | 2 | 0 |
gene2 | chr2 | 110 | 110 | 1 | 1 | 0 | 0 | 1 |
gene3 | chr3 | 210 | 216 | 3 | 4 | 1 | 1 | 3 |
gene4 | chr4 | 390 | 390 | 2 | 1 | 1 | 0 | 1 |
That's it,
Pierre
No comments:
Post a Comment