Mapping the annotations of a query sequence on a BLAST hit, my notebook.
This post is the answer to my own question on biostar "BLASTN / TBLASTN : mapping the features of the query to the hit.". I wrote a java program to map the annotations of a sequence to the Hit of a Blast result. The tool is available on github at https://github.com/lindenb/jvarkit.
For example, say you want to map the features of the Uniprot record for Rotavirus NSP3 (http://www.uniprot.org/uniprot/P04514) on Genbank http://www.ncbi.nlm.nih.gov/nuccore/AY065842.1 (RNA).
Download P04514 as XML
TblastN P04514(protein) vs AY065842 (RNA) and save the BLAST result as XML:
<BlastOutput_program>tblastn</BlastOutput_program> <BlastOutput_version>TBLASTN 2.2.28+</BlastOutput_version> (...) <Hit> <Hit_num>1</Hit_num> <Hit_id>gi|18139606|gb|AY065842.1|</Hit_id> <Hit_def>AY065842</Hit_def> <Hit_accession>AY065842</Hit_accession> <Hit_len>1078</Hit_len> <Hit_hsps> <Hsp> <Hsp_num>1</Hsp_num> <Hsp_bit-score>546.584</Hsp_bit-score> <Hsp_score>1407</Hsp_score> <Hsp_evalue>0</Hsp_evalue> <Hsp_query-from>1</Hsp_query-from> <Hsp_query-to>313</Hsp_query-to> <Hsp_hit-from>26</Hsp_hit-from> <Hsp_hit-to>964</Hsp_hit-to> <Hsp_query-frame>0</Hsp_query-frame> <Hsp_hit-frame>2</Hsp_hit-frame> <Hsp_identity>260</Hsp_identity> <Hsp_positive>260</Hsp_positive> <Hsp_gaps>0</Hsp_gaps> <Hsp_align-len>313</Hsp_align-len> <Hsp_qseq>MLKMESTQQMASSIINTSFEAAVVAATSTLELMGIQYDYNEIYTRVKSKFDYVMDDSGVKNNLLGKAATIDQALNGKFGSVMRNKNWMTDSRTVAKLDEDVNKLRMMLSSKGIDQKMRVLNACFSVKRIPGKSSSVIKCTRLMKDKIERGAVEVDDSFVEEKMEVDTVDWKSRYDQLERRFESLKQRVNEKYTTWVQKAKKVNENMYSLQNVISQQQNQIADLQNYCSKLEADLQNKVGSLVSSVEWYLKSMELPDEVKTDIEQQLNSIDTISPINAIDDLEILIRNLIHDYDRTFLMFKGLLRQCNYEYAYE</Hsp_qseq> <Hsp_hseq>MLKMESTQQMASSIINSSFEAAVVAATSTLELMGIQYDYNEVYTRVKSKFDLVMDDSGVKNNLIGKAITIDQALNGKFSSAIRNRNWMTDSRTVAKLDEDVNKLRIMLSSKGIDQKMRVLNACFSVKRIPGKSSSIVKCTRLMKDKLERGEVEVDDSFVEEKMEVDTIDWKSRYEQLEKRFESLKHRVNEKYNHWVLKARKVNENMNSLQNVISQQQAHINELQMYNNKLERDLQSKIGSVVSSIEWYLRSMELSDDVKSDIEQQLNSIDQLNPVNAIDDFESILRNLISDYDRLFIMFKGLLQQCNYTYTYE</Hsp_hseq> <Hsp_midline>MLKMESTQQMASSIIN SFEAAVVAATSTLELMGIQYDYNE YTRVKSKFD VMDDSGVKNNL GKA TIDQALNGKF S RN NWMTDSRTVAKLDEDVNKLR MLSSKGIDQKMRVLNACFSVKRIPGKSSS KCTRLMKDK ERG VEVDDSFVEEKMEVDT DWKSRY QLE RFESLK RVNEKY WV KA KVNENM SLQNVISQQQ I LQ Y KLE DLQ K GS VSS EWYL SMEL D VK DIEQQLNSID P NAIDD E RNLI DYDR F MFKGLL QCNY Y YE</Hsp_midline> </Hsp> (..)Now generate a BED file to map the features of P04514 on AY065842:
$ java -jar dist/blastmapannots.jar I=P04514.xml B=blast.xml AY065842 25 961 Non-structural_protein_3 943 + 25961 255,255,255 1 936 25 AY065842 34 469 RNA-binding 970 + 34 469 255,255,255 1 435 34 AY065842 472 640 Dimerization 947 + 472 640 255,255,255 1 168 472 AY065842 532 724 Interaction_with_ZC3H7B 917 + 532 724 255,255,255 1 192 532 AY065842 646 961 Interaction_with_EIF4G1 905 + 646 961 255,255,255 1 315 646 AY065842 520 733 coiled-coil_region 916 + 520 733 255,255,255 1 213 520
That's it,
Pierre
No comments:
Post a Comment