22 March 2011

Blast Stylesheet : XML to HTML

I wrote a XSLT stylesheet for the following question on Biostar: I'd like to create an HTML file (from the XML file and XSL stylesheet) similar to what It can be achieved when we performed a BLAST search on the NCBI server.

The stylesheet I wrote is available on github at: https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ncbi/blast2html.xsl. (see also my previous post blast2svg )

Usage:

xsltproc --novalid blast2html.xsl blast.xml > blast.html

Example:

Here is a XML output of blast:
<BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>BLASTP 2.2.25+</BlastOutput_version>
<BlastOutput_reference>Alejandro A. Sch&auml;ffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005.</BlastOutput_reference>
<BlastOutput_db>N/A</BlastOutput_db>
<BlastOutput_query-ID>gi|187956781|gb|AAI40897.1|</BlastOutput_query-ID>
<BlastOutput_query-def>EIF4G1 protein [Homo sapiens]</BlastOutput_query-def>
<BlastOutput_query-len>1606</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>BLOSUM62</Parameters_matrix>
<Parameters_expect>10</Parameters_expect>
<Parameters_gap-open>11</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>F</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>gi|187956781|gb|AAI40897.1|</Iteration_query-ID>
<Iteration_query-def>EIF4G1 protein [Homo sapiens]</Iteration_query-def>
<Iteration_query-len>1606</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|293340930|ref|XP_002724789.1|</Hit_id>
<Hit_def>PREDICTED: eukaryotic translation initiation factor 4 gamma, 1 isoform 2 [Rattus norvegicus] >gi|293352298|ref|XP_002727969.1| PREDICTED: eukaryotic translation initiation factor 4, gamma 1 isoform 1 [Rattus norvegicus]</Hit_def>
<Hit_accession>XP_002727969</Hit_accession>
<Hit_len>1584</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>2715.64</Hsp_bit-score>
<Hsp_score>7038</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>1606</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>1584</Hsp_hit-to>
<Hsp_query-frame>0</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>1450</Hsp_identity>
<Hsp_positive>1450</Hsp_positive>
<Hsp_gaps>36</Hsp_gaps>
<Hsp_align-len>1613</Hsp_align-len>
<Hsp_qseq>MNKAPQSTGPPPAPSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQGGFRSLQHFYPSRAQPPSSAASRVQSAAPARPGPAAHVYPAGSQVMMIPSQISYPASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQGVQQFPTGVAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGGLEPQANGETPQVAVIVRPDDRSQGAIIADRPGLPGPEHSP-SESQPSSPSPTPSPSPVLEPGSEPNLAVLSIPGDTMTT--IQMSVEESTPISRETGEPYRLSPEPTPLAEPILEVEVTLSKPVPESEFSSSPLQAPTPLASHTVEIHEPNGMVPSEDLEPEVESSPELAPPP--ACPSESPVPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEV-TASMAPPTIPSATPATAPSATSPAQEEEMEEEEEEEEGEAGEAGEAESEKGGEELLPPESTPIPANLSQNLEAAAATQVAVSVPKRRRKIKELNKKEAVGDLLDAFKEANPAVPEVENQPPAGSNPGPESEGSGVPPRPEEADETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHISDVVLDKANKTPLRPLDPTRLQGINCGPDFTPSFANLGRTTLSTRGPPRGGPGGELPRGPAGLGPRRSQQGPRKEPRKIIATVLMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKGSDKRRGGPPGPPISRGLPLVDDGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDAASEAARPATSTLNRFSALQQAVPTESTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTEDRDRGRDAVKREAALPPVSPLKAALSEEELEKKSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRHGVESTLERSAIAREHMGQLLHQLLCAGHLSTAQYYQGLYEILELAEDMEIDIPHVWLYLAELVTPILQEGGVPMGELFREITKPLRPLGKAASLLLEILGLLCKSMGPKKVGTLWREAGLSWKEFLPEGQDIGAFVAEQKVEYTLGEESEAPGQRALPSEELNRQLEKLLKEGSSNQRVFDWIEANLSEQQIVSNTLVRALMTAVCYSAIIFETPLRVDVAVLKARAKLLQKYLCDEQKELQALYALQALVVTLEQPPNLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFKWLREAE-EESDHN</Hsp_qseq>
<Hsp_hseq>MNKAPQPTGPPPARSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ-------HFYPSRAQPPSSAASRVQSAAPARPGPAPHVYPAGSQVMMIPSQISYSASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQSVQQFPASVAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGSLEPQPNGESPQVAVIIRPDDRSQGAAIGGRPGLPGPEHSPGTESQPSSPSPTPSPPPILEPGSESNLGVLSIPGDTMTTGMIPISVEESTPISCESGEPYCLSPEPT-LAEPILEVEVTLSKPIPESEFSSSPLQVSTSLVPHRAETHEPNGVIPSEDLEPEVESSTEPAPPPLSACASESLVPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEVPSAALA--SIVSPTPPVAPSDTSAAQEEEIEED-------EDEDGEAESEKGGEDL-PLDSTPVPAQLSQNLEVAAAPQVAVSVPKRRRKIKELNKKEAVGDLLDAFKEVDPAVPEVENQPPTGSNPSPESEGSAALPQPEEAEETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHITDVVLDKANKTPLRSLDPSRLPGINCGPDFTPSFANLGRPTLSSRGPPRGGPGGELPRGPAGLGPRRSQQGPRKETRKIISSVIMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDFAKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRQSNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKGGDKRRGGPPGPP-------VNDGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDTASEATRPA--TLNRFSALQQTLPVENTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTE--DRGRDPVKREATLPPVSPPKAALAVDEVERKSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRLGIESTLERSTIAREHMGRLLHQLLCAGHLSTAQYYQGLYETLELAEDMEIDIPHVWLYLAELITPILQEDGVPMGELFREITKPLRPMGKATSLLLEILGLLCKSMGPKKVGMLWREAGLSWREFLAEGQDVGSFVAEKKVEYTLGEESEAPGQRALAFEELRRQLEKLLKDGGSNQRVFDWIEANLNEQQIASNTLVRALMTTVCYSAIIFETPLRVDVQVLKVRARLLQKYLSDEQKELQALYALQALVVTLEQPANLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFNWLREAEDEESDHN</Hsp_hseq>
<Hsp_midline>MNKAPQ TGPPPA SPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ HFYPSRAQPPSSAASRVQSAAPARPGPA HVYPAGSQVMMIPSQISY ASQGAYYIPGQGRSTYVVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQ VQQFP VAPAPVLMNQPPQIAPKRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGG LEPQ NGE PQVAVI RPDDRSQGA I RPGLPGPEHSP ESQPSSPSPTPSP P LEPGSE NL VLSIPGDTMTT I SVEESTPIS E GEPY LSPEPT LAEPILEVEVTLSKP PESEFSSSPLQ T L H E HEPNG PSEDLEPEVESS E APPP AC SES VPIAPTAQPEELLNGAPSPPAVDLSPVSEPEEQAKEV A A I S TP APS TS AQEEE EE E GEAESEKGGE L P STP PA LSQNLE AAA QVAVSVPKRRRKIKELNKKEAVGDLLDAFKE PAVPEVENQPP GSNP PESEGS P PEEA ETWDSKEDKIHNAENIQPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHI DVVLDKANKTPLR LDP RL GINCGPDFTPSFANLGR TLS RGPPRGGPGGELPRGPAGLGPRRSQQGPRKE RKII V MTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLDF KAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLR SNWVPRRGDQGPKTIDQIHKEAEMEEHREHIKVQQLMAKG DKRRGGPPGPP V DGGWNTVPISKGSRPIDTSRLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSD ASEA RPA TLNRFSALQQ P E TDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFSKEVEERSRERPSQPEGLRKAASLTE DRGRD VKREA LPPVSP KAAL E E KSKAIIEEYLHLNDMKEAVQCVQELASPSLLFIFVR G ESTLERS IAREHMG LLHQLLCAGHLSTAQYYQGLYE LELAEDMEIDIPHVWLYLAEL TPILQE GVPMGELFREITKPLRP GKA SLLLEILGLLCKSMGPKKVG LWREAGLSW EFL EGQD G FVAE KVEYTLGEESEAPGQRAL EEL RQLEKLLK G SNQRVFDWIEANL EQQI SNTLVRALMT VCYSAIIFETPLRVDV VLK RA LLQKYL DEQKELQALYALQALVVTLEQP NLLRMFFDALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFF WLREAE EESDHN</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>0</Statistics_db-num>
<Statistics_db-len>0</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>-1</Statistics_kappa>
<Statistics_lambda>-1</Statistics_lambda>
<Statistics_entropy>-1</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>

After processing:

(...)

Descriptions

AccessionDefe-value
XP_002727969PREDICTED: eukaryotic translation initiation factor 4 gamma, 1 isoform 2 [Rattus norvegicus] >gi|293352298|ref|XP_002727969.1| PREDICTED: eukaryotic translation initiation factor 4, gamma 1 isoform 1 [Rattus norvegicus]0
(...)

Alignments

>gi|293340930|ref|XP_002724789.1||XP_002727969|PREDICTED: eukaryotic translation initiation factor 4 gamma, 1 isoform 2 [Rattus norvegicus] >gi|293352298|ref|XP_002727969.1| PREDICTED: eukaryotic translation initiation factor 4, gamma 1 isoform 1 [Rattus norvegicus]
Length=1584
Score = 2715.64 bits (7038), Expect = 0
Identities = 1450/1613 (89.8946063236206%), Gaps = 36/1613 (2.231866088034718%)
Strand = Plus/Plus

Query 1 MNKAPQSTGPPPAPSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQGGFRSLQHFYP 60
MNKAPQ TGPPPA SPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ HFYP
Sbjct 1 MNKAPQPTGPPPARSPGLPQPAFPPGQTAPVVFSTPQATQMNTPSQPRQ-------HFYP 53

Query 61 SRAQPPSSAASRVQSAAPARPGPAAHVYPAGSQVMMIPSQISYPASQGAYYIPGQGRSTY 120
SRAQPPSSAASRVQSAAPARPGPA HVYPAGSQVMMIPSQISY ASQGAYYIPGQGRSTY
Sbjct 54 SRAQPPSSAASRVQSAAPARPGPAPHVYPAGSQVMMIPSQISYSASQGAYYIPGQGRSTY 113

Query 121 VVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQGVQQFPTGVAPAPVLMNQPPQIAP 180
VVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQ VQQFP VAPAPVLMNQPPQIAP
Sbjct 114 VVPTQQYPVQPGAPGFYPGASPTEFGTYAGAYYPAQSVQQFPASVAPAPVLMNQPPQIAP 173

Query 181 KRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGGLEPQANGETPQVAVIVRPD 240
KRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGG LEPQ NGE PQVAVI RPD
Sbjct 174 KRERKTIRIRDPNQGGKDITEEIMSGARTASTPTPPQTGGSLEPQPNGESPQVAVIIRPD 233

Query 241 DRSQGAIIADRPGLPGPEHSP-SESQPSSPSPTPSPSPVLEPGSEPNLAVLSIPGDTMTT 299
DRSQGA I RPGLPGPEHSP ESQPSSPSPTPSP P LEPGSE NL VLSIPGDTMTT
Sbjct 234 DRSQGAAIGGRPGLPGPEHSPGTESQPSSPSPTPSPPPILEPGSESNLGVLSIPGDTMTT 293

Query 300 --IQMSVEESTPISRETGEPYRLSPEPTPLAEPILEVEVTLSKPVPESEFSSSPLQAPTP 357
I SVEESTPIS E GEPY LSPEPT LAEPILEVEVTLSKP PESEFSSSPLQ T
Sbjct 294 GMIPISVEESTPISCESGEPYCLSPEPT-LAEPILEVEVTLSKPIPESEFSSSPLQVSTS 352

Query 358 LASHTVEIHEPNGMVPSEDLEPEVESSPELAPPP--ACPSESPVPIAPTAQPEELLNGAP 415
L H E HEPNG PSEDLEPEVESS E APPP AC SES VPIAPTAQPEELLNGAP
Sbjct 353 LVPHRAETHEPNGVIPSEDLEPEVESSTEPAPPPLSACASESLVPIAPTAQPEELLNGAP 412

Query 416 SPPAVDLSPVSEPEEQAKEV-TASMAPPTIPSATPATAPSATSPAQEEEMEEEEEEEEGE 474
SPPAVDLSPVSEPEEQAKEV A A I S TP APS TS AQEEE EE
Sbjct 413 SPPAVDLSPVSEPEEQAKEVPSAALA--SIVSPTPPVAPSDTSAAQEEEIEED------- 463

Query 475 AGEAGEAESEKGGEELLPPESTPIPANLSQNLEAAAATQVAVSVPKRRRKIKELNKKEAV 534
E GEAESEKGGE L P STP PA LSQNLE AAA QVAVSVPKRRRKIKELNKKEAV
Sbjct 464 EDEDGEAESEKGGEDL-PLDSTPVPAQLSQNLEVAAAPQVAVSVPKRRRKIKELNKKEAV 522

Query 535 GDLLDAFKEANPAVPEVENQPPAGSNPGPESEGSGVPPRPEEADETWDSKEDKIHNAENI 594
GDLLDAFKE PAVPEVENQPP GSNP PESEGS P PEEA ETWDSKEDKIHNAENI
Sbjct 523 GDLLDAFKEVDPAVPEVENQPPTGSNPSPESEGSAALPQPEEAEETWDSKEDKIHNAENI 582

Query 595 QPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHISDVVLDKANKT 654
QPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHI DVVLDKANKT
Sbjct 583 QPGEQKYEYKSDQWKPLNLEEKKRYDREFLLGFQFIFASMQKPEGLPHITDVVLDKANKT 642

Query 655 PLRPLDPTRLQGINCGPDFTPSFANLGRTTLSTRGPPRGGPGGELPRGPAGLGPRRSQQG 714
PLR LDP RL GINCGPDFTPSFANLGR TLS RGPPRGGPGGELPRGPAGLGPRRSQQG
Sbjct 643 PLRSLDPSRLPGINCGPDFTPSFANLGRPTLSSRGPPRGGPGGELPRGPAGLGPRRSQQG 702

Query 715 PRKEPRKIIATVLMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSI 774
PRKE RKII V MTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSI
Sbjct 703 PRKETRKIISSVIMTEDIKLNKAEKAWKPSSKRTAADKDRGEEDADGSKTQDLFRRVRSI 762

Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKV 834
LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKV
Sbjct 763 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCLMALKV 822

Query 835 PTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEA 894
PTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEA
Sbjct 823 PTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEA 882

Query 895 RDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLD 954
RDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLD
Sbjct 883 RDIARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLLKNHDEESLECLCRLLTTIGKDLD 942

Query 955 FEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRGDQGPKTIDQIHK 1014
F KAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLR SNWVPRRGDQGPKTIDQIHK
Sbjct 943 FAKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRQSNWVPRRGDQGPKTIDQIHK 1002

Query 1015 EAEMEEHREHIKVQQLMAKGSDKRRGGPPGPPISRGLPLVDDGGWNTVPISKGSRPIDTS 1074
EAEMEEHREHIKVQQLMAKG DKRRGGPPGPP V DGGWNTVPISKGSRPIDTS
Sbjct 1003 EAEMEEHREHIKVQQLMAKGGDKRRGGPPGPP-------VNDGGWNTVPISKGSRPIDTS 1055

Query 1075 RLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDAASEAARPATSTLNRFSALQ 1134
RLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSD ASEA RPA TLNRFSALQ
Sbjct 1056 RLTKITKPGSIDSNNQLFAPGGRLSWGKGSSGGSGAKPSDTASEATRPA--TLNRFSALQ 1113

Query 1135 QAVPTESTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFS 1194
Q P E TDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFS
Sbjct 1114 QTLPVENTDNRRVVQRSSLSRERGEKAGDRGDRLERSERGGDRGDRLDRARTPATKRSFS 1173

Query 1195 KEVEERSRERPSQPEGLRKAASLTEDRDRGRDAVKREAALPPVSPLKAALSEEELEKKSK 1254
KEVEERSRERPSQPEGLRKAASLTE DRGRD VKREA LPPVSP KAAL E E KSK
Sbjct 1174 KEVEERSRERPSQPEGLRKAASLTE--DRGRDPVKREATLPPVSPPKAALAVDEVERKSK 1231

Query 1255 AIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRHGVESTLERSAIAREHMGQLLHQLLCA 1314
AIIEEYLHLNDMKEAVQCVQELASPSLLFIFVR G ESTLERS IAREHMG LLHQLLCA
Sbjct 1232 AIIEEYLHLNDMKEAVQCVQELASPSLLFIFVRLGIESTLERSTIAREHMGRLLHQLLCA 1291

Query 1315 GHLSTAQYYQGLYEILELAEDMEIDIPHVWLYLAELVTPILQEGGVPMGELFREITKPLR 1374
GHLSTAQYYQGLYE LELAEDMEIDIPHVWLYLAEL TPILQE GVPMGELFREITKPLR
Sbjct 1292 GHLSTAQYYQGLYETLELAEDMEIDIPHVWLYLAELITPILQEDGVPMGELFREITKPLR 1351

Query 1375 PLGKAASLLLEILGLLCKSMGPKKVGTLWREAGLSWKEFLPEGQDIGAFVAEQKVEYTLG 1434
P GKA SLLLEILGLLCKSMGPKKVG LWREAGLSW EFL EGQD G FVAE KVEYTLG
Sbjct 1352 PMGKATSLLLEILGLLCKSMGPKKVGMLWREAGLSWREFLAEGQDVGSFVAEKKVEYTLG 1411

Query 1435 EESEAPGQRALPSEELNRQLEKLLKEGSSNQRVFDWIEANLSEQQIVSNTLVRALMTAVC 1494
EESEAPGQRAL EEL RQLEKLLK G SNQRVFDWIEANL EQQI SNTLVRALMT VC
Sbjct 1412 EESEAPGQRALAFEELRRQLEKLLKDGGSNQRVFDWIEANLNEQQIASNTLVRALMTTVC 1471

Query 1495 YSAIIFETPLRVDVAVLKARAKLLQKYLCDEQKELQALYALQALVVTLEQPPNLLRMFFD 1554
YSAIIFETPLRVDV VLK RA LLQKYL DEQKELQALYALQALVVTLEQP NLLRMFFD
Sbjct 1472 YSAIIFETPLRVDVQVLKVRARLLQKYLSDEQKELQALYALQALVVTLEQPANLLRMFFD 1531

Query 1555 ALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFKWLREAE-EESDHN 1606
ALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFF WLREAE EESDHN
Sbjct 1532 ALYDEDVVKEDAFYSWESSKDPAEQQGKGVALKSVTAFFNWLREAEDEESDHN 1584



That's it,

Pierre

2 comments:

Ashish Kumar said...

Thanks for the post!!!

KHoegenauer said...

Thanks! This (or more specifically, the Blast2tsv version) is exactly what I need... almost...
It's designed for a single query sequence, but I'd like to modify it to work with multiple iterations (result of running BLAST with fasta file containing multiple sequences), then annotate the iteration definitions and lengths.
I understand I'll need to add another for-each loop, modify the XPath pointers (...XPaths...?), and set/call variables for the two new fields.
But, I am new to XSLT and am screwing up somewhere; most likely in the match/select statements.
Any suggestions?

[Also noticed that "Hit_def" and "Hit_len" are "Hit-def" and "Hit-len" in the TSV version, causing a few minor holes in the output.]

Thanks again for the great code/help