Blast Stylesheet : XML to HTML
I wrote a XSLT stylesheet for the following question on Biostar: I'd like to create an HTML file (from the XML file and XSL stylesheet) similar to what It can be achieved when we performed a BLAST search on the NCBI server.
The stylesheet I wrote is available on github at: https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ncbi/blast2html.xsl. (see also my previous post blast2svg )
After processing:
That's it,
Pierre
The stylesheet I wrote is available on github at: https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ncbi/blast2html.xsl. (see also my previous post blast2svg )
Usage:
xsltproc --novalid blast2html.xsl blast.xml > blast.html
Example:
Here is a XML output of blast:<BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>BLASTP 2.2.25+</BlastOutput_version>
<BlastOutput_reference>Alejandro A. Schäffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005.</BlastOutput_reference>
<BlastOutput_db>N/A</BlastOutput_db>
<BlastOutput_query-ID>gi|187956781|gb|AAI40897.1|</BlastOutput_query-ID>
<BlastOutput_query-def>EIF4G1 protein [Homo sapiens]</BlastOutput_query-def>
<BlastOutput_query-len>1606</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>BLOSUM62</Parameters_matrix>
<Parameters_expect>10</Parameters_expect>
<Parameters_gap-open>11</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>F</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>gi|187956781|gb|AAI40897.1|</Iteration_query-ID>
<Iteration_query-def>EIF4G1 protein [Homo sapiens]</Iteration_query-def>
<Iteration_query-len>1606</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|293340930|ref|XP_002724789.1|</Hit_id>
<Hit_def>PREDICTED: eukaryotic translation initiation factor 4 gamma, 1 isoform 2 [Rattus norvegicus] >gi|293352298|ref|XP_002727969.1| PREDICTED: eukaryotic translation initiation factor 4, gamma 1 isoform 1 [Rattus norvegicus]</Hit_def>
<Hit_accession>XP_002727969</Hit_accession>
<Hit_len>1584</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>2715.64</Hsp_bit-score>
<Hsp_score>7038</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>1606</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>1584</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>1450</Hsp_identity>
<Hsp_positive>1450</Hsp_positive>
<Hsp_gaps>36</Hsp_gaps>
<Hsp_align-len>1613</Hsp_align-len>
<Hsp_qseq>MNKAPQSTGPPPAPSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQGGFRSLQHFYPSRAQPPSSAASRVQSAAPARPGPAAHVYPAGSQVMMIPSQISYPASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQGVQQFPTGVAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGGLEPQANGETPQVAVIVRPDDRSQGAIIADRPGLPGPEHSP-SESQPSSPSPTPSPSPVLEPGSEPNLAVLSIPGDTMTT--IQMSVEESTPISRETGEPYRLSPEPTPLAEPILEVEVTLSKPVPESEFSSSPLQAPTPLASHTVEIHEPNGMVPSEDLEPEVESSPELAPPP--ACPSESPVPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEV-TASMAPPTIPSATPATAPSATSPAQEEEMEEEEEEEEGEAGEAGEAESEKGGEELLPPESTPIPANLSQNLEAAAATQVAVSVPKRRRKIKELNKKEAVGDLLDAFKEANPAVPEVENQPPAGSNPGPESEGSGVPPRPEEADETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHISDVVLDKANKTPLRPLDPTRLQGINCGPDFTPSFANLGRTTLSTRGPPRGGPGGELPRGPAGLGPRRSQQGPRKEPRKIIATVLMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKGSDKRRGGPPGPPISRGLPLVDDGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDAASEAARPATSTLNRFSALQQAVPTESTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTEDRDRGRDAVKREAALPPVSPLKAALSEEELEKKSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRHGVESTLERSAIAREHMGQLLHQLLCAGHLSTAQYYQGLYEILELAEDMEIDIPHVWLYLAELVTPILQEGGVPMGELFREITKPLRPLGKAASLLLEILGLLCKSMGPKKVGTLWREAGLSWKEFLPEGQDIGAFVAEQKVEYTLGEESEAPGQRALPSEELNRQLEKLLKEGSSNQRVFDWIEANLSEQQIVSNTLVRALMTAVCYSAIIFETPLRVDVAVLKARAKLLQKYLCDEQKELQALYALQALVVTLEQPPNLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFKWLREAE-EESDHN</Hsp_qseq>
<Hsp_hseq>MNKAPQPTGPPPARSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ-------HFYPSRAQPPSSAASRVQSAAPARPGPAPHVYPAGSQVMMIPSQISYSASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQSVQQFPASVAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGSLEPQPNGESPQVAVIIRPDDRSQGAAIGGRPGLPGPEHSPGTESQPSSPSPTPSPPPILEPGSESNLGVLSIPGDTMTTGMIPISVEESTPISCESGEPYCLSPEPT-LAEPILEVEVTLSKPIPESEFSSSPLQVSTSLVPHRAETHEPNGVIPSEDLEPEVESSTEPAPPPLSACASESLVPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEVPSAALA--SIVSPTPPVAPSDTSAAQEEEIEED-------EDEDGEAESEKGGEDL-PLDSTPVPAQLSQNLEVAAAPQVAVSVPKRRRKIKELNKKEAVGDLLDAFKEVDPAVPEVENQPPTGSNPSPESEGSAALPQPEEAEETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHITDVVLDKANKTPLRSLDPSRLPGINCGPDFTPSFANLGRPTLSSRGPPRGGPGGELPRGPAGLGPRRSQQGPRKETRKIISSVIMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDFAKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRQSNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKGGDKRRGGPPGPP-------VNDGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDTASEATRPA--TLNRFSALQQTLPVENTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTE--DRGRDPVKREATLPPVSPPKAALAVDEVERKSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRLGIESTLERSTIAREHMGRLLHQLLCAGHLSTAQYYQGLYETLELAEDMEIDIPHVWLYLAELITPILQEDGVPMGELFREITKPLRPMGKATSLLLEILGLLCKSMGPKKVGMLWREAGLSWREFLAEGQDVGSFVAEKKVEYTLGEESEAPGQRALAFEELRRQLEKLLKDGGSNQRVFDWIEANLNEQQIASNTLVRALMTTVCYSAIIFETPLRVDVQVLKVRARLLQKYLSDEQKELQALYALQALVVTLEQPANLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFNWLREAEDEESDHN</Hsp_hseq>
<Hsp_midline>MNKAPQ TGPPPA SPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ HFYPSRAQPPSSAASRVQSAAPARPGPA HVYPAGSQVMMIPSQISY ASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQ VQQFP VAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGG LEPQ NGE PQVAVI RPDDRSQGA I RPGLPGPEHSP ESQPSSPSPTPSP P LEPGSE NL VLSIPGDTMTT I SVEESTPIS E GEPY LSPEPT LAEPILEVEVTLSKP PESEFSSSPLQ T L H E HEPNG PSEDLEPEVESS E APPP AC SES VPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEV A A I S TP APS TS AQEEE EE E GEAESEKGGE L P STP PA LSQNLE AAA QVAVSVPKRRRKIKELNKKEAVGDLLDAFKE PAVPEVENQPP GSNP PESEGS P PEEA ETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHI DVVLDKANKTPLR LDP RL GINCGPDFTPSFANLGR TLS RGPPRGGPGGELPRGPAGLGPRRSQQGPRKE RKII V MTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDF KAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLR SNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKG DKRRGGPPGPP V DGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSD ASEA RPA TLNRFSALQQ P E TDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTE DRGRD VKREA LPPVSP KAAL E E KSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVR G ESTLERS IAREHMG LLHQLLCAGHLSTAQYYQGLYE LELAEDMEIDIPHVWLYLAEL TPILQE GVPMGELFREITKPLRP GKA SLLLEILGLLCKSMGPKKVG LWREAGLSW EFL EGQD G FVAE KVEYTLGEESEAPGQRAL EEL RQLEKLLK G SNQRVFDWIEANL EQQI SNTLVRALMT VCYSAIIFETPLRVDV VLK RA LLQKYL DEQKELQALYALQALVVTLEQP NLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFF WLREAE EESDHN</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>0</Statistics_db-num>
<Statistics_db-len>0</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>-1</Statistics_kappa>
<Statistics_lambda>-1</Statistics_lambda>
<Statistics_entropy>-1</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>BLASTP 2.2.25+</BlastOutput_version>
<BlastOutput_reference>Alejandro A. Schäffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005.</BlastOutput_reference>
<BlastOutput_db>N/A</BlastOutput_db>
<BlastOutput_query-ID>gi|187956781|gb|AAI40897.1|</BlastOutput_query-ID>
<BlastOutput_query-def>EIF4G1 protein [Homo sapiens]</BlastOutput_query-def>
<BlastOutput_query-len>1606</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>BLOSUM62</Parameters_matrix>
<Parameters_expect>10</Parameters_expect>
<Parameters_gap-open>11</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>F</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>gi|187956781|gb|AAI40897.1|</Iteration_query-ID>
<Iteration_query-def>EIF4G1 protein [Homo sapiens]</Iteration_query-def>
<Iteration_query-len>1606</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|293340930|ref|XP_002724789.1|</Hit_id>
<Hit_def>PREDICTED: eukaryotic translation initiation factor 4 gamma, 1 isoform 2 [Rattus norvegicus] >gi|293352298|ref|XP_002727969.1| PREDICTED: eukaryotic translation initiation factor 4, gamma 1 isoform 1 [Rattus norvegicus]</Hit_def>
<Hit_accession>XP_002727969</Hit_accession>
<Hit_len>1584</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>2715.64</Hsp_bit-score>
<Hsp_score>7038</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>1606</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>1584</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>1450</Hsp_identity>
<Hsp_positive>1450</Hsp_positive>
<Hsp_gaps>36</Hsp_gaps>
<Hsp_align-len>1613</Hsp_align-len>
<Hsp_qseq>MNKAPQSTGPPPAPSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQGGFRSLQHFYPSRAQPPSSAASRVQSAAPARPGPAAHVYPAGSQVMMIPSQISYPASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQGVQQFPTGVAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGGLEPQANGETPQVAVIVRPDDRSQGAIIADRPGLPGPEHSP-SESQPSSPSPTPSPSPVLEPGSEPNLAVLSIPGDTMTT--IQMSVEESTPISRETGEPYRLSPEPTPLAEPILEVEVTLSKPVPESEFSSSPLQAPTPLASHTVEIHEPNGMVPSEDLEPEVESSPELAPPP--ACPSESPVPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEV-TASMAPPTIPSATPATAPSATSPAQEEEMEEEEEEEEGEAGEAGEAESEKGGEELLPPESTPIPANLSQNLEAAAATQVAVSVPKRRRKIKELNKKEAVGDLLDAFKEANPAVPEVENQPPAGSNPGPESEGSGVPPRPEEADETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHISDVVLDKANKTPLRPLDPTRLQGINCGPDFTPSFANLGRTTLSTRGPPRGGPGGELPRGPAGLGPRRSQQGPRKEPRKIIATVLMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKGSDKRRGGPPGPPISRGLPLVDDGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDAASEAARPATSTLNRFSALQQAVPTESTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTEDRDRGRDAVKREAALPPVSPLKAALSEEELEKKSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRHGVESTLERSAIAREHMGQLLHQLLCAGHLSTAQYYQGLYEILELAEDMEIDIPHVWLYLAELVTPILQEGGVPMGELFREITKPLRPLGKAASLLLEILGLLCKSMGPKKVGTLWREAGLSWKEFLPEGQDIGAFVAEQKVEYTLGEESEAPGQRALPSEELNRQLEKLLKEGSSNQRVFDWIEANLSEQQIVSNTLVRALMTAVCYSAIIFETPLRVDVAVLKARAKLLQKYLCDEQKELQALYALQALVVTLEQPPNLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFKWLREAE-EESDHN</Hsp_qseq>
<Hsp_hseq>MNKAPQPTGPPPARSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ-------HFYPSRAQPPSSAASRVQSAAPARPGPAPHVYPAGSQVMMIPSQISYSASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQSVQQFPASVAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGSLEPQPNGESPQVAVIIRPDDRSQGAAIGGRPGLPGPEHSPGTESQPSSPSPTPSPPPILEPGSESNLGVLSIPGDTMTTGMIPISVEESTPISCESGEPYCLSPEPT-LAEPILEVEVTLSKPIPESEFSSSPLQVSTSLVPHRAETHEPNGVIPSEDLEPEVESSTEPAPPPLSACASESLVPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEVPSAALA--SIVSPTPPVAPSDTSAAQEEEIEED-------EDEDGEAESEKGGEDL-PLDSTPVPAQLSQNLEVAAAPQVAVSVPKRRRKIKELNKKEAVGDLLDAFKEVDPAVPEVENQPPTGSNPSPESEGSAALPQPEEAEETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHITDVVLDKANKTPLRSLDPSRLPGINCGPDFTPSFANLGRPTLSSRGPPRGGPGGELPRGPAGLGPRRSQQGPRKETRKIISSVIMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDFAKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRQSNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKGGDKRRGGPPGPP-------VNDGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDTASEATRPA--TLNRFSALQQTLPVENTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTE--DRGRDPVKREATLPPVSPPKAALAVDEVERKSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRLGIESTLERSTIAREHMGRLLHQLLCAGHLSTAQYYQGLYETLELAEDMEIDIPHVWLYLAELITPILQEDGVPMGELFREITKPLRPMGKATSLLLEILGLLCKSMGPKKVGMLWREAGLSWREFLAEGQDVGSFVAEKKVEYTLGEESEAPGQRALAFEELRRQLEKLLKDGGSNQRVFDWIEANLNEQQIASNTLVRALMTTVCYSAIIFETPLRVDVQVLKVRARLLQKYLSDEQKELQALYALQALVVTLEQPANLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFNWLREAEDEESDHN</Hsp_hseq>
<Hsp_midline>MNKAPQ TGPPPA SPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ HFYPSRAQPPSSAASRVQSAAPARPGPA HVYPAGSQVMMIPSQISY ASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQ VQQFP VAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGG LEPQ NGE PQVAVI RPDDRSQGA I RPGLPGPEHSP ESQPSSPSPTPSP P LEPGSE NL VLSIPGDTMTT I SVEESTPIS E GEPY LSPEPT LAEPILEVEVTLSKP PESEFSSSPLQ T L H E HEPNG PSEDLEPEVESS E APPP AC SES VPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEV A A I S TP APS TS AQEEE EE E GEAESEKGGE L P STP PA LSQNLE AAA QVAVSVPKRRRKIKELNKKEAVGDLLDAFKE PAVPEVENQPP GSNP PESEGS P PEEA ETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHI DVVLDKANKTPLR LDP RL GINCGPDFTPSFANLGR TLS RGPPRGGPGGELPRGPAGLGPRRSQQGPRKE RKII V MTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDF KAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLR SNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKG DKRRGGPPGPP V DGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSD ASEA RPA TLNRFSALQQ P E TDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTE DRGRD VKREA LPPVSP KAAL E E KSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVR G ESTLERS IAREHMG LLHQLLCAGHLSTAQYYQGLYE LELAEDMEIDIPHVWLYLAEL TPILQE GVPMGELFREITKPLRP GKA SLLLEILGLLCKSMGPKKVG LWREAGLSW EFL EGQD G FVAE KVEYTLGEESEAPGQRAL EEL RQLEKLLK G SNQRVFDWIEANL EQQI SNTLVRALMT VCYSAIIFETPLRVDV VLK RA LLQKYL DEQKELQALYALQALVVTLEQP NLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFF WLREAE EESDHN</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>0</Statistics_db-num>
<Statistics_db-len>0</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>-1</Statistics_kappa>
<Statistics_lambda>-1</Statistics_lambda>
<Statistics_entropy>-1</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
After processing:
(...)
Descriptions
Accession | Def | e-value |
---|---|---|
XP_002727969 | PREDICTED: eukaryotic translation initiation factor 4 gamma, 1 isoform 2 [Rattus norvegicus] >gi|293352298|ref|XP_002727969.1| PREDICTED: eukaryotic translation initiation factor 4, gamma 1 isoform 1 [Rattus norvegicus] | 0 |
(...)
Alignments
>gi|293340930|ref|XP_002724789.1||XP_002727969|PREDICTED: eukaryotic translation initiation factor 4 gamma, 1 isoform 2 [Rattus norvegicus] >gi|293352298|ref|XP_002727969.1| PREDICTED: eukaryotic translation initiation factor 4, gamma 1 isoform 1 [Rattus norvegicus]
Length=1584
Length=1584
Score = 2715.64 bits (7038), Expect = 0
Identities = 1450/1613 (89.8946063236206%), Gaps = 36/1613 (2.231866088034718%)
Strand = Plus/Plus
Identities = 1450/1613 (89.8946063236206%), Gaps = 36/1613 (2.231866088034718%)
Strand = Plus/Plus
Query 1 MNKAPQSTGPPPAPSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQGGFRSLQHFYP 60
MNKAPQ TGPPPA SPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ HFYP
Sbjct 1 MNKAPQPTGPPPARSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ-------HFYP 53
Query 61 SRAQPPSSAASRVQSAAPARPGPAAHVYPAGSQVMMIPSQISYPASQGAYYIPGQGRSTY 120
SRAQPPSSAASRVQSAAPARPGPA HVYPAGSQVMMIPSQISY ASQGAYYIPGQGRSTY
Sbjct 54 SRAQPPSSAASRVQSAAPARPGPAPHVYPAGSQVMMIPSQISYSASQGAYYIPGQGRSTY 113
Query 121 VVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQGVQQFPTGVAPAPVLMNQPPQIAP 180
VVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQ VQQFP VAPAPVLMNQPPQIAP
Sbjct 114 VVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQSVQQFPASVAPAPVLMNQPPQIAP 173
Query 181 KRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGGLEPQANGETPQVAVIVRPD 240
KRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGG LEPQ NGE PQVAVI RPD
Sbjct 174 KRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGSLEPQPNGESPQVAVIIRPD 233
Query 241 DRSQGAIIADRPGLPGPEHSP-SESQPSSPSPTPSPSPVLEPGSEPNLAVLSIPGDTMTT 299
DRSQGA I RPGLPGPEHSP ESQPSSPSPTPSP P LEPGSE NL VLSIPGDTMTT
Sbjct 234 DRSQGAAIGGRPGLPGPEHSPGTESQPSSPSPTPSPPPILEPGSESNLGVLSIPGDTMTT 293
Query 300 --IQMSVEESTPISRETGEPYRLSPEPTPLAEPILEVEVTLSKPVPESEFSSSPLQAPTP 357
I SVEESTPIS E GEPY LSPEPT LAEPILEVEVTLSKP PESEFSSSPLQ T
Sbjct 294 GMIPISVEESTPISCESGEPYCLSPEPT-LAEPILEVEVTLSKPIPESEFSSSPLQVSTS 352
Query 358 LASHTVEIHEPNGMVPSEDLEPEVESSPELAPPP--ACPSESPVPIAPTAQPEELLNGAP 415
L H E HEPNG PSEDLEPEVESS E APPP AC SES VPIAPTAQPEELLNGAP
Sbjct 353 LVPHRAETHEPNGVIPSEDLEPEVESSTEPAPPPLSACASESLVPIAPTAQPEELLNGAP 412
Query 416 SPPAVDLSPVSEPEEQAKEV-TASMAPPTIPSATPATAPSATSPAQEEEMEEEEEEEEGE 474
SPPAVDLSPVSEPEEQAKEV A A I S TP APS TS AQEEE EE
Sbjct 413 SPPAVDLSPVSEPEEQAKEVPSAALA--SIVSPTPPVAPSDTSAAQEEEIEED------- 463
Query 475 AGEAGEAESEKGGEELLPPESTPIPANLSQNLEAAAATQVAVSVPKRRRKIKELNKKEAV 534
E GEAESEKGGE L P STP PA LSQNLE AAA QVAVSVPKRRRKIKELNKKEAV
Sbjct 464 EDEDGEAESEKGGEDL-PLDSTPVPAQLSQNLEVAAAPQVAVSVPKRRRKIKELNKKEAV 522
Query 535 GDLLDAFKEANPAVPEVENQPPAGSNPGPESEGSGVPPRPEEADETWDSKEDKIHNAENI 594
GDLLDAFKE PAVPEVENQPP GSNP PESEGS P PEEA ETWDSKEDKIHNAENI
Sbjct 523 GDLLDAFKEVDPAVPEVENQPPTGSNPSPESEGSAALPQPEEAEETWDSKEDKIHNAENI 582
Query 595 QPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHISDVVLDKANKT 654
QPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHI DVVLDKANKT
Sbjct 583 QPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHITDVVLDKANKT 642
Query 655 PLRPLDPTRLQGINCGPDFTPSFANLGRTTLSTRGPPRGGPGGELPRGPAGLGPRRSQQG 714
PLR LDP RL GINCGPDFTPSFANLGR TLS RGPPRGGPGGELPRGPAGLGPRRSQQG
Sbjct 643 PLRSLDPSRLPGINCGPDFTPSFANLGRPTLSSRGPPRGGPGGELPRGPAGLGPRRSQQG 702
Query 715 PRKEPRKIIATVLMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSI 774
PRKE RKII V MTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSI
Sbjct 703 PRKETRKIISSVIMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSI 762
Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKV 834
LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKV
Sbjct 763 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKV 822
Query 835 PTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEA 894
PTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEA
Sbjct 823 PTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEA 882
Query 895 RDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLD 954
RDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLD
Sbjct 883 RDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLD 942
Query 955 FEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRGDQGPKTIDQIHK 1014
F KAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLR SNWVPRRGDQGPKTIDQIHK
Sbjct 943 FAKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRQSNWVPRRGDQGPKTIDQIHK 1002
Query 1015 EAEMEEHREHIKVQQLMAKGSDKRRGGPPGPPISRGLPLVDDGGWNTVPISKGSRPIDTS 1074
EAEMEEHREHIKVQQLMAKG DKRRGGPPGPP V DGGWNTVPISKGSRPIDTS
Sbjct 1003 EAEMEEHREHIKVQQLMAKGGDKRRGGPPGPP-------VNDGGWNTVPISKGSRPIDTS 1055
Query 1075 RLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDAASEAARPATSTLNRFSALQ 1134
RLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSD ASEA RPA TLNRFSALQ
Sbjct 1056 RLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDTASEATRPA--TLNRFSALQ 1113
Query 1135 QAVPTESTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFS 1194
Q P E TDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFS
Sbjct 1114 QTLPVENTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFS 1173
Query 1195 KEVEERSRERPSQPEGLRKAASLTEDRDRGRDAVKREAALPPVSPLKAALSEEELEKKSK 1254
KEVEERSRERPSQPEGLRKAASLTE DRGRD VKREA LPPVSP KAAL E E KSK
Sbjct 1174 KEVEERSRERPSQPEGLRKAASLTE--DRGRDPVKREATLPPVSPPKAALAVDEVERKSK 1231
Query 1255 AIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRHGVESTLERSAIAREHMGQLLHQLLCA 1314
AIIEEYLHLNDMKEAVQCVQELASPSLLFIFVR G ESTLERS IAREHMG LLHQLLCA
Sbjct 1232 AIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRLGIESTLERSTIAREHMGRLLHQLLCA 1291
Query 1315 GHLSTAQYYQGLYEILELAEDMEIDIPHVWLYLAELVTPILQEGGVPMGELFREITKPLR 1374
GHLSTAQYYQGLYE LELAEDMEIDIPHVWLYLAEL TPILQE GVPMGELFREITKPLR
Sbjct 1292 GHLSTAQYYQGLYETLELAEDMEIDIPHVWLYLAELITPILQEDGVPMGELFREITKPLR 1351
Query 1375 PLGKAASLLLEILGLLCKSMGPKKVGTLWREAGLSWKEFLPEGQDIGAFVAEQKVEYTLG 1434
P GKA SLLLEILGLLCKSMGPKKVG LWREAGLSW EFL EGQD G FVAE KVEYTLG
Sbjct 1352 PMGKATSLLLEILGLLCKSMGPKKVGMLWREAGLSWREFLAEGQDVGSFVAEKKVEYTLG 1411
Query 1435 EESEAPGQRALPSEELNRQLEKLLKEGSSNQRVFDWIEANLSEQQIVSNTLVRALMTAVC 1494
EESEAPGQRAL EEL RQLEKLLK G SNQRVFDWIEANL EQQI SNTLVRALMT VC
Sbjct 1412 EESEAPGQRALAFEELRRQLEKLLKDGGSNQRVFDWIEANLNEQQIASNTLVRALMTTVC 1471
Query 1495 YSAIIFETPLRVDVAVLKARAKLLQKYLCDEQKELQALYALQALVVTLEQPPNLLRMFFD 1554
YSAIIFETPLRVDV VLK RA LLQKYL DEQKELQALYALQALVVTLEQP NLLRMFFD
Sbjct 1472 YSAIIFETPLRVDVQVLKVRARLLQKYLSDEQKELQALYALQALVVTLEQPANLLRMFFD 1531
Query 1555 ALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFKWLREAE-EESDHN 1606
ALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFF WLREAE EESDHN
Sbjct 1532 ALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFNWLREAEDEESDHN 1584
That's it,
Pierre
2 comments:
Thanks for the post!!!
Thanks! This (or more specifically, the Blast2tsv version) is exactly what I need... almost...
It's designed for a single query sequence, but I'd like to modify it to work with multiple iterations (result of running BLAST with fasta file containing multiple sequences), then annotate the iteration definitions and lengths.
I understand I'll need to add another for-each loop, modify the XPath pointers (...XPaths...?), and set/call variables for the two new fields.
But, I am new to XSLT and am screwing up somewhere; most likely in the match/select statements.
Any suggestions?
[Also noticed that "Hit_def" and "Hit_len" are "Hit-def" and "Hit-len" in the TSV version, causing a few minor holes in the output.]
Thanks again for the great code/help
Post a Comment