Consequences : SNP, cDNA, proteins, etc....
This post is about Consequences, a tool finding the consequences of a set of mutations mapped on the human genome. It was motivated by a recent post of FriendFeed, Daniel MacArthur asked:“Given a list of human b36 coordinates for a list of genic SNPs (most not in dbSNP), what would be the quickest way to get a list of the genes they're found in and, if possible, the amino acid position they would affect?”.
About one year ago, I wrote a tool named "Consequences" answering this question but the sources are somewhere in a tar.gz , burned in an old CD, in a cardboard, in my cellar... so it was faster to re-write this simple code from scratch. The result should be fine but please, tell me if you find a bug.
This tool takes as input a tab delimited file containing the following fields:
- A Name for your SNP
- the chromosome e.g. 'chr2' (at this time only one chromosome per input is supported)
- the position on the chromosome. The first base is indexed at 0
- The base observed ON THE PLUS STRAND OF THE GENOME
<observed-mutation position="1116" name="snp1" base="A">
<gene name="uc001aaa.2" exon-count="3" strand="+" txStart="1115" txEnd="4121" cdsStart="1115" cdsEnd="1115">
<in-utr-3/>
</gene>
<gene name="uc009vip.1" exon-count="2" strand="+" txStart="1115" txEnd="4272" cdsStart="1115" cdsEnd="1115">
<in-utr-3/>
</gene>
</observed-mutation>
(...)
</observed-mutation>
<observed-mutation position="1149167" name="snp282" base="A">
<gene name="uc009vjv.1" exon-count="6" strand="-" txStart="1142150" txEnd="1157310" cdsStart="1142754" cdsEnd="1149171">
<in-exon name="Exon 2" codon-wild="CAG" codon-mut="TAG" aa-wild="Q" aa-mut="*" base-wild="C" base-mut="T" index-cdna="3" index-protein="1">
<wild-cDNA>ATG C AGCGCTGGATCATGGAGAAGACGGCCGAGCACTTCCAGGAGGCCATGGAGGAGAGCAAGACACACTTCCGCGCCGTGGACCCTGACGGGGACGGTCACGTGTCTTGGGACGAGTATAAGGTGAAGTTTTTGGCGAGTAAAGGCCATAGCGAGAAGGAGGTTGCCGACGCCATCAGGCTCAACGAGGAACTCAAAGTGGATGAGGAAACACAGGAAGTCCTGGAGAACCTGAAGGACCGCTGGTACCAGGCGGACAGCCCCCCTGCAGACCTGCTGCTGACGGAGGAGGAGTTCCTGTCGTTCCTCCACCCCGAGCACAGCCGGGGAATGCTCAGGTTCATGGTGAAGGAGATCGTCCGGGACCTGGACCAGGACGGTGACAAGCAGCTCTCTGTGCCCGAGTTCATCTCCCTGCCCGTGGGCACCGTGGAGAACCAGCAGGGCCAGGACATTGACGACAACTGGGTGAAAGACAGAAAAAAGGAGTTTGAGGAGCTCATTGACTCCAACCACGACGGCATCGTGACCGCCGAGGAGCTGGAGAGCTACATGGACCCCATGAACGAGTACAACGCGCTGAACGAGGCCAAGCAGATGATCGCCGTCGCCGACGAGAACCAGAACCACCACCTGGAGCCCGAGGAGGTGCTCAAGTACAGCGAGTTCTTCACGGGCAGCAAGCTGGTGGACTACGCGCGCAGCGTGCACGAGGAGTTTTGA</wild-cDNA>
<mut-cDNA>ATG T AGCGCTGGATCATGGAGAAGACGGCCGAGCACTTCCAGGAGGCCATGGAGGAGAGCAAGACACACTTCCGCGCCGTGGACCCTGACGGGGACGGTCACGTGTCTTGGGACGAGTATAAGGTGAAGTTTTTGGCGAGTAAAGGCCATAGCGAGAAGGAGGTTGCCGACGCCATCAGGCTCAACGAGGAACTCAAAGTGGATGAGGAAACACAGGAAGTCCTGGAGAACCTGAAGGACCGCTGGTACCAGGCGGACAGCCCCCCTGCAGACCTGCTGCTGACGGAGGAGGAGTTCCTGTCGTTCCTCCACCCCGAGCACAGCCGGGGAATGCTCAGGTTCATGGTGAAGGAGATCGTCCGGGACCTGGACCAGGACGGTGACAAGCAGCTCTCTGTGCCCGAGTTCATCTCCCTGCCCGTGGGCACCGTGGAGAACCAGCAGGGCCAGGACATTGACGACAACTGGGTGAAAGACAGAAAAAAGGAGTTTGAGGAGCTCATTGACTCCAACCACGACGGCATCGTGACCGCCGAGGAGCTGGAGAGCTACATGGACCCCATGAACGAGTACAACGCGCTGAACGAGGCCAAGCAGATGATCGCCGTCGCCGACGAGAACCAGAACCACCACCTGGAGCCCGAGGAGGTGCTCAAGTACAGCGAGTTCTTCACGGGCAGCAAGCTGGTGGACTACGCGCGCAGCGTGCACGAGGAGTTTTGA</mut-cDNA>
<wild-protein>M Q RWIMEKTAEHFQEAMEESKTHFRAVDPDGDGHVSWDEYKVKFLASKGHSEKEVADAIRLNEELKVDEETQEVLENLKDRWYQADSPPADLLLTEEEFLSFLHPEHSRGMLRFMVKEIVRDLDQDGDKQLSVPEFISLPVGTVENQQGQDIDDNWVKDRKKEFEELIDSNHDGIVTAEELESYMDPMNEYNALNEAKQMIAVADENQNHHLEPEEVLKYSEFFTGSKLVDYARSVHEEF*</wild-protein>
<mut-protein>M * RWIMEKTAEHFQEAMEESKTHFRAVDPDGDGHVSWDEYKVKFLASKGHSEKEVADAIRLNEELKVDEETQEVLENLKDRWYQADSPPADLLLTEEEFLSFLHPEHSRGMLRFMVKEIVRDLDQDGDKQLSVPEFISLPVGTVENQQGQDIDDNWVKDRKKEFEELIDSNHDGIVTAEELESYMDPMNEYNALNEAKQMIAVADENQNHHLEPEEVLKYSEFFTGSKLVDYARSVHEEF*</mut-protein>
</in-exon>
</gene>
<gene name="uc009vjw.1" exon-count="7" strand="-" txStart="1142150" txEnd="1157310" cdsStart="1142150" cdsEnd="1142150">
<in-utr-5/>
</gene>
</observed-mutation>
(...)
<observed-mutation position="1205906" name="snp195" base="A">
<gene name="uc001adt.1" exon-count="18" strand="+" txStart="1205678" txEnd="1217272" cdsStart="1205904" cdsEnd="1216853">
<in-exon name="Exon 1" codon-wild="ATG" codon-mut="ATA" aa-wild="M" aa-mut="I" base-wild="G" base-mut="A" index-cdna="2" index-protein="0">
<wild-cDNA>AT G AGGGCAGTGCTGTCACAGAAGACAACACCGCTCCCTCGTTACCTGTGGCCCGGCCACCTCAGCGGCCCAAGGAGGCTCACCTGGTCATGGTGCAGTGACCACAGGACCCCCACATGCCGGGAGCTGGGTTCGCCCCACCCCACCCCCTGCACCGGGCCAGCGAGGGGATGGCCCAGAAGAGGGGGAGGACCATGTGGATTCACCAGTGCTGGACATGTGCTCTGTGGCTACCCCCTCTGCCTACTCTCTGGCCCGATACAGGGGTGTGGGACAGGCCTGGGTGACTCCAGCATGGCTTTCCTCTCCAGGACGTCACCGGTGGCAGCTGCTTCCTTCCAGAGCCGGCAGGAGGCCAGAGGCTCCATCCTGCTTCAGAGCTGCCAGCTGCCCCCGCAATGGCTGAGCACCGAAGCATGGACGGGAGAATGGAAGCAGCCACACGGGGGGGCTCTCACCTCCAGATCGCCTGGGCCTGTGGCTCCCCAGAGGCCCTGCCACCTGAAGGGATGGCAGCACAGACCCACTCAGCACAACGCTGCCTGCAAACAGGGCCAGGCTGCAGCCCAGACGCCCCCCAGGCCGGGGCCACCATCAGCACCACCACCACCACCCAAGGAGGGGCACCAGGAGGGGCTGGTGGAGCTGCCCGCCTCGTTCCGGGAGCTGCTCACCTTCTTCTGCACCAATGCCACCATCCACGGCGCCATCCGCCTGGTCTGCTCCCGCGGGAACCGCCTCAAGACGACGTCCTGGGGGCTGCTGTCCCTGGGAGCCCTGGTCGCGCTCTGCTGGCAGCTGGGGCTCCTCTTTGAGCGTCACTGGCACCGCCCGGTCCTCATGGCCGTCTCTGTGCACTCGGAGCGCAAGCTGCTCCCGCTGGTCACCCTGTGTGACGGGAACCCACGTCGGCCGAGTCCGGTCCTCCGCCATCTGGAGCTGCTGGACGAGTTTGCCAGGGAGAACATTGACTCCCTGTACAACGTCAACCTCAGCAAAGGCAGAGCCGCCCTCTCCGCCACTGTCCCCCGCCACGAGCCCCCCTTCCACCTGGACCGGGAGATCCGTCTGCAGAGGCTGAGCCACTCGGGCAGCCGGGTCAGAGTGGGGTTCAGACTGTGCAACAGCACGGGCGGCGACTGCTTTTACCGAGGCTACACGTCAGGCGTGGCGGCTGTCCAGGACTGGTACCACTTCCACTATGTGGATATCCTGGCCCTGCTGCCCGCGGCATGGGAGGACAGCCACGGGAGCCAGGACGGCCACTTCGTCCTCTCCTGCAGTTACGATGGCCTGGACTGCCAGGCCCGACAGTTCCGGACCTTCCACCACCCCACCTACGGCAGCTGCTACACGGTCGATGGCGTCTGGACAGCTCAGCGCCCCGGCATCACCCACGGAGTCGGCCTGGTCCTCAGGGTTGAGCAGCAGCCTCACCTCCCTCTGCTGTCCACGCTGGCCGGCATCAGGGTCATGGTTCACGGCCGTAACCACACGCCCTTCCTGGGGCACCACAGCTTCAGCGTCCGGCCAGGGACGGAGGCCACCATCAGCATCCGAGAGGACGAGGTGCACCGGCTCGGGAGCCCCTACGGCCACTGCACCGCCGGCGGGGAAGGCGTGGAGGTGGAGCTGCTACACAACACCTCCTACACCAGGCAGGCCTGCCTGGTGTCCTGCTTCCAGCAGCTGATGGTGGAGACCTGCTCCTGTGGCTACTACCTCCACCCTCTGCCGGCGGGGGCTGAGTACTGCAGCTCTGCCCGGCACCCTGCCTGGGGACACTGCTTCTACCGCCTCTACCAGGACCTGGAGACCCACCGGCTCCCCTGTACCTCCCGCTGCCCCAGGCCCTGCAGGGAGTCTGCATTCAAGCTCTCCACTGGGACCTCCAGGTGGCCTTCCGCCAAGTCAGCTGGATGGACTCTGGCCACGCTAGGTGAACAGGGGCTGCCGCATCAGAGCCACAGACAGAGGAGCAGCCTGGCCAAAATCAACATCGTCTACCAGGAGCTCAACTACCGCTCAGTGGAGGAGGCGCCCGTGTACTCGGTGCCGCAGCTGCTCTCGGCCATGGGCAGCCTCTGCAGCCTGTGGTTTGGGGCCTCCGTCCTCTCCCTCCTGGAGCTCCTGGAGCTGCTGCTCGATGCTTCTGCCCTCACCCTGGTGCTAGGCGGCCGCCGGCTCCGCAGGGCGTGGTTCTCCTGGCCCAGAGCCAGCCCTGCCTCAGGGGCGTCCAGCATCAAGCCAGAGGCCAGTCAGATGCCCCCGCCTGCAGGCGGCACGTCAGATGACCCGGAGCCCAGCGGGCCTCATCTCCCACGGGTGATGCTTCCAGGGGTTCTGGCGGGAGTCTCAGCCGAAGAGAGCTGGGCTGGGCCCCAGCCCCTTGAGACTCTGGACACCTGA</wild-cDNA>
<mut-cDNA>AT A AGGGCAGTGCTGTCACAGAAGACAACACCGCTCCCTCGTTACCTGTGGCCCGGCCACCTCAGCGGCCCAAGGAGGCTCACCTGGTCATGGTGCAGTGACCACAGGACCCCCACATGCCGGGAGCTGGGTTCGCCCCACCCCACCCCCTGCACCGGGCCAGCGAGGGGATGGCCCAGAAGAGGGGGAGGACCATGTGGATTCACCAGTGCTGGACATGTGCTCTGTGGCTACCCCCTCTGCCTACTCTCTGGCCCGATACAGGGGTGTGGGACAGGCCTGGGTGACTCCAGCATGGCTTTCCTCTCCAGGACGTCACCGGTGGCAGCTGCTTCCTTCCAGAGCCGGCAGGAGGCCAGAGGCTCCATCCTGCTTCAGAGCTGCCAGCTGCCCCCGCAATGGCTGAGCACCGAAGCATGGACGGGAGAATGGAAGCAGCCACACGGGGGGGCTCTCACCTCCAGATCGCCTGGGCCTGTGGCTCCCCAGAGGCCCTGCCACCTGAAGGGATGGCAGCACAGACCCACTCAGCACAACGCTGCCTGCAAACAGGGCCAGGCTGCAGCCCAGACGCCCCCCAGGCCGGGGCCACCATCAGCACCACCACCACCACCCAAGGAGGGGCACCAGGAGGGGCTGGTGGAGCTGCCCGCCTCGTTCCGGGAGCTGCTCACCTTCTTCTGCACCAATGCCACCATCCACGGCGCCATCCGCCTGGTCTGCTCCCGCGGGAACCGCCTCAAGACGACGTCCTGGGGGCTGCTGTCCCTGGGAGCCCTGGTCGCGCTCTGCTGGCAGCTGGGGCTCCTCTTTGAGCGTCACTGGCACCGCCCGGTCCTCATGGCCGTCTCTGTGCACTCGGAGCGCAAGCTGCTCCCGCTGGTCACCCTGTGTGACGGGAACCCACGTCGGCCGAGTCCGGTCCTCCGCCATCTGGAGCTGCTGGACGAGTTTGCCAGGGAGAACATTGACTCCCTGTACAACGTCAACCTCAGCAAAGGCAGAGCCGCCCTCTCCGCCACTGTCCCCCGCCACGAGCCCCCCTTCCACCTGGACCGGGAGATCCGTCTGCAGAGGCTGAGCCACTCGGGCAGCCGGGTCAGAGTGGGGTTCAGACTGTGCAACAGCACGGGCGGCGACTGCTTTTACCGAGGCTACACGTCAGGCGTGGCGGCTGTCCAGGACTGGTACCACTTCCACTATGTGGATATCCTGGCCCTGCTGCCCGCGGCATGGGAGGACAGCCACGGGAGCCAGGACGGCCACTTCGTCCTCTCCTGCAGTTACGATGGCCTGGACTGCCAGGCCCGACAGTTCCGGACCTTCCACCACCCCACCTACGGCAGCTGCTACACGGTCGATGGCGTCTGGACAGCTCAGCGCCCCGGCATCACCCACGGAGTCGGCCTGGTCCTCAGGGTTGAGCAGCAGCCTCACCTCCCTCTGCTGTCCACGCTGGCCGGCATCAGGGTCATGGTTCACGGCCGTAACCACACGCCCTTCCTGGGGCACCACAGCTTCAGCGTCCGGCCAGGGACGGAGGCCACCATCAGCATCCGAGAGGACGAGGTGCACCGGCTCGGGAGCCCCTACGGCCACTGCACCGCCGGCGGGGAAGGCGTGGAGGTGGAGCTGCTACACAACACCTCCTACACCAGGCAGGCCTGCCTGGTGTCCTGCTTCCAGCAGCTGATGGTGGAGACCTGCTCCTGTGGCTACTACCTCCACCCTCTGCCGGCGGGGGCTGAGTACTGCAGCTCTGCCCGGCACCCTGCCTGGGGACACTGCTTCTACCGCCTCTACCAGGACCTGGAGACCCACCGGCTCCCCTGTACCTCCCGCTGCCCCAGGCCCTGCAGGGAGTCTGCATTCAAGCTCTCCACTGGGACCTCCAGGTGGCCTTCCGCCAAGTCAGCTGGATGGACTCTGGCCACGCTAGGTGAACAGGGGCTGCCGCATCAGAGCCACAGACAGAGGAGCAGCCTGGCCAAAATCAACATCGTCTACCAGGAGCTCAACTACCGCTCAGTGGAGGAGGCGCCCGTGTACTCGGTGCCGCAGCTGCTCTCGGCCATGGGCAGCCTCTGCAGCCTGTGGTTTGGGGCCTCCGTCCTCTCCCTCCTGGAGCTCCTGGAGCTGCTGCTCGATGCTTCTGCCCTCACCCTGGTGCTAGGCGGCCGCCGGCTCCGCAGGGCGTGGTTCTCCTGGCCCAGAGCCAGCCCTGCCTCAGGGGCGTCCAGCATCAAGCCAGAGGCCAGTCAGATGCCCCCGCCTGCAGGCGGCACGTCAGATGACCCGGAGCCCAGCGGGCCTCATCTCCCACGGGTGATGCTTCCAGGGGTTCTGGCGGGAGTCTCAGCCGAAGAGAGCTGGGCTGGGCCCCAGCCCCTTGAGACTCTGGACACCTGA</mut-cDNA>
<wild-protein> M RAVLSQKTTPLPRYLWPGHLSGPRRLTWSWCSDHRTPTCRELGSPHPTPCTGPARGWPRRGGGPCGFTSAGHVLCGYPLCLLSGPIQGCGTGLGDSSMAFLSRTSPVAAASFQSRQEARGSILLQSCQLPPQWLSTEAWTGEWKQPHGGALTSRSPGPVAPQRPCHLKGWQHRPTQHNAACKQGQAAAQTPPRPGPPSAPPPPPKEGHQEGLVELPASFRELLTFFCTNATIHGAIRLVCSRGNRLKTTSWGLLSLGALVALCWQLGLLFERHWHRPVLMAVSVHSERKLLPLVTLCDGNPRRPSPVLRHLELLDEFARENIDSLYNVNLSKGRAALSATVPRHEPPFHLDREIRLQRLSHSGSRVRVGFRLCNSTGGDCFYRGYTSGVAAVQDWYHFHYVDILALLPAAWEDSHGSQDGHFVLSCSYDGLDCQARQFRTFHHPTYGSCYTVDGVWTAQRPGITHGVGLVLRVEQQPHLPLLSTLAGIRVMVHGRNHTPFLGHHSFSVRPGTEATISIREDEVHRLGSPYGHCTAGGEGVEVELLHNTSYTRQACLVSCFQQLMVETCSCGYYLHPLPAGAEYCSSARHPAWGHCFYRLYQDLETHRLPCTSRCPRPCRESAFKLSTGTSRWPSAKSAGWTLATLGEQGLPHQSHRQRSSLAKINIVYQELNYRSVEEAPVYSVPQLLSAMGSLCSLWFGASVLSLLELLELLLDASALTLVLGGRRLRRAWFSWPRASPASGASSIKPEASQMPPPAGGTSDDPEPSGPHLPRVMLPGVLAGVSAEESWAGPQPLETLDT*</wild-protein>
<mut-protein> I RAVLSQKTTPLPRYLWPGHLSGPRRLTWSWCSDHRTPTCRELGSPHPTPCTGPARGWPRRGGGPCGFTSAGHVLCGYPLCLLSGPIQGCGTGLGDSSMAFLSRTSPVAAASFQSRQEARGSILLQSCQLPPQWLSTEAWTGEWKQPHGGALTSRSPGPVAPQRPCHLKGWQHRPTQHNAACKQGQAAAQTPPRPGPPSAPPPPPKEGHQEGLVELPASFRELLTFFCTNATIHGAIRLVCSRGNRLKTTSWGLLSLGALVALCWQLGLLFERHWHRPVLMAVSVHSERKLLPLVTLCDGNPRRPSPVLRHLELLDEFARENIDSLYNVNLSKGRAALSATVPRHEPPFHLDREIRLQRLSHSGSRVRVGFRLCNSTGGDCFYRGYTSGVAAVQDWYHFHYVDILALLPAAWEDSHGSQDGHFVLSCSYDGLDCQARQFRTFHHPTYGSCYTVDGVWTAQRPGITHGVGLVLRVEQQPHLPLLSTLAGIRVMVHGRNHTPFLGHHSFSVRPGTEATISIREDEVHRLGSPYGHCTAGGEGVEVELLHNTSYTRQACLVSCFQQLMVETCSCGYYLHPLPAGAEYCSSARHPAWGHCFYRLYQDLETHRLPCTSRCPRPCRESAFKLSTGTSRWPSAKSAGWTLATLGEQGLPHQSHRQRSSLAKINIVYQELNYRSVEEAPVYSVPQLLSAMGSLCSLWFGASVLSLLELLELLLDASALTLVLGGRRLRRAWFSWPRASPASGASSIKPEASQMPPPAGGTSDDPEPSGPHLPRVMLPGVLAGVSAEESWAGPQPLETLDT*</mut-protein>
</in-exon>
</gene>
<gene name="uc001adu.1" exon-count="17" strand="+" txStart="1205678" txEnd="1217272" cdsStart="1209267" cdsEnd="1216853">
<in-utr-5/>
</gene>
</observed-mutation>
(...)
</consequences>
The source code is available here:
A 'jar' is available for download at http://lindenb.googlecode.com/files/consequences.jar.
Running the tool:
Well, that is not big science but it might be helpful.
That's it.
Pierre
No comments:
Post a Comment