05 October 2011

Grouping mutations/Gene=f(sample)

GroupByGene is a small C++ tool grouping the data:

  • CHROM
  • POS
  • REF
  • GENE
  • SAMPLE
by gene=f(sample). This tool is available on google code : http://code.google.com/p/variationtoolkit/source/browse/trunk/src/groupbygene.cpp
Example:
$ cat input.tsv

#CHROM	POS	REF	ALT	GENE	SAMPLE	
chr1	10	A	T	gene1	indi1
chr1	10	A	T	gene1	indi2
chr1	11	C	G	gene1	indi2
chr2	110	C	G	gene2	indi3
chr3	210	A	T	gene3	indi1
chr3	211	C	T	gene3	indi2
chr3	211	C	T	gene3	indi3
chr3	215	C	G	gene3	indi3
chr3	216	C	T	gene3	indi3
chr4	390	C	T	gene4	indi1
chr4	390	C	A	gene4	indi3

Calling "groupbygene:


$ groupbygene  --chrom 1 --pos 2 --ref 3 --alt 4 --sample 6 --gene 5 < input.tsv

GENECHROMSTARTENDcount
SAMPLES
distinct
MUTATIONS
count(indi1)count(indi2)count(indi3)
gene1chr1101122120
gene2chr211011011001
gene3chr321021634113
gene4chr439039022101


$ groupbygene  --chrom 1 --pos 2 --ref 3 --alt 4 --sample 6 --gene 5 --norefalt < input.tsv

GENECHROMSTARTENDcount
SAMPLES
distinct
MUTATIONS
count(indi1)count(indi2)count(indi3)
gene1chr1101122120
gene2chr211011011001
gene3chr321021634113
gene4chr439039021101


That's it,

Pierre

No comments: