13 July 2012

Parsing the Newick format in C using flex and bison.

The following post is my answer for this question on biostar "Newick 2 Json converter".
The Newick tree format is a simple format used to write out trees (using parentheses and commas) in a text file .
The original question asked for a parser based on perl but here, I've implemented a C parser using flex/bison.


Example:

((Human:0.3, Chimpanzee:0.2):0.1, Gorilla:0.3, (Mouse:0.6, Rat:0.5):0.2);

A formal grammar for the Newick format is available here
Items in { } may appear zero or more times.
   Items in [ ] are optional, they may appear once or not at all.
   All other punctuation marks (colon, semicolon, parentheses, comma and
         single quote) are required parts of the format.


              tree ==> descendant_list [ root_label ] [ : branch_length ] ;

   descendant_list ==> ( subtree { , subtree } )

           subtree ==> descendant_list [internal_node_label] [: branch_length]
                   ==> leaf_label [: branch_length]

            root_label ==> label
   internal_node_label ==> label
            leaf_label ==> label

                 label ==> unquoted_label
                       ==> quoted_label

        unquoted_label ==> string_of_printing_characters
          quoted_label ==> ' string_of_printing_characters '

         branch_length ==> signed_number
                       ==> unsigned_number

The Flex Lexer

The Flex Lexer is used to extract the terminal tokens of the grammar from the input stream.
Those terminals are '(' ')' ',' ';' ':' , strings and numbers. For the simple and double quoted strings, we tell the lexer to enter in a specific state ( 'apos' and 'quot').

The Bison Scanner

The Bison scanner reads the tokens returned by Flex and implements the grammar.
The simple structure holding the tree is defined in 'struct tree_t'. The code also contains some methods to dump the tree as JSON.

Makefile


Testing

compile:
$ make
bison -d newick.y
flex newick.l
gcc -Wall -O3 newick.tab.c lex.yy.c
lex.yy.c:1265:17: warning: ‘yyunput’ defined but not used [-Wunused-function]
lex.yy.c:1306:16: warning: ‘input’ defined but not used [-Wunused-function]
test:
echo "((Human:0.3, Chimpanzee:0.2):0.1, Gorilla:0.3, (Mouse:0.6, Rat:0.5):0.2);" | ./a.out

{
    "children": [
        {
            "length": 0.1,
            "children": [
                {
                    "label": "Human",
                    "length": 0.3
                },
                {
                    "label": "Chimpanzee",
                    "length": 0.2
                }
            ]
        },
        {
            "label": "Gorilla",
            "length": 0.3
        },
        {
            "length": 0.2,
            "children": [
                {
                    "label": "Mouse",
                    "length": 0.6
                },
                {
                    "label": "Rat",
                    "length": 0.5
                }
            ]
        }
    ]
}


That's it,

Pierre

10 July 2012

GNU C++ hash_set vs STL std::set: my notebook

A set is a C++ container that stores unique elements. The C++ Standard Template library  (STL) defines a C++ template set<T> that is typically implemented as a binary search tree.

#include<set>

But the GNU C++ library also provides a (non-standard) hash-based set:

#include<ext/hash_set>

In the following code I've created some random #rs numbers and I print the time needed to insert/remove them from a (GNU/hash_set or STL ) set:

STL version

$ g++  -Wall -O3 testset.cpp
$ ./a.out 
Time: 109.27seconds.

GNU hash_set version

$  g++ -DWITH_HASHSET=1 -Wall -O3 testset.cpp
In file included from /usr/include/c++/4.6/ext/hash_set:61:0,
                 from jeter.cpp:7:
/usr/include/c++/4.6/backward/backward_warning.h:33:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed (...)
$ ./a.out 
Time: 49.69seconds.

That's it,


Pierre




09 July 2012

Using the flickr XML/API as a source of RSS feeds.

You may know that I seek from time to time royalty free pictures  on flickr.com for my other personal blog. The flickr API can be used to search for these images but it is currently not possible to generate a RSS feed to be alerted when a new image is posted on flickr.

The following XSLT stylesheet transforms the XML returned by www.flickr.com to a RSS feed (latest source: https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/flickr/flickr2rss.xsl ):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:date="http://exslt.org/dates-and-times"
xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.0">

<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

<xsl:param name="title">No Title</xsl:param>


<xsl:template match="/">
<rss version="2.0">
<channel>

<title><xsl:value-of select="$title"/></title>
<link>http://www.flickr.com</link>
<description><xsl:value-of select="$title"/></description>
<pubDate><xsl:value-of select="date:date-time()"/></pubDate>
<lastBuildDate><xsl:value-of select="date:date-time()"/></lastBuildDate>
<generator>http://www.flickr.com/</generator>
<xsl:apply-templates select="rsp/photos/photo"/>
</channel>
</rss>
</xsl:template>

<xsl:template match="photo">
<item>
<title><xsl:value-of select="@title"/> : <xsl:value-of select="$title"/></title>
<link>http://www.flickr.com/photos/<xsl:value-of select="@owner"/>/<xsl:value-of select="@id"/>/</link>
<pubDate><xsl:value-of select="@datetaken"/></pubDate>

<author>
<xsl:choose>
<xsl:when test="@ownername"><xsl:value-of select="@ownername"/></xsl:when>
<xsl:otherwise><xsl:value-of select="@owner"/></xsl:otherwise>
</xsl:choose>
</author>
<guid isPermaLink="false">http://www.flickr.com/photos/<xsl:value-of select="@owner"/>/<xsl:value-of select="@id"/>/</guid>
<description>
<xsl:text><p><img </xsl:text>
<xsl:choose>
<xsl:when test="@height_s and @width_s">
<xsl:text> width='</xsl:text>
<xsl:value-of select="@width_s"/>
<xsl:text>' height='</xsl:text>
<xsl:value-of select="@height_s"/>
<xsl:text>' </xsl:text>
</xsl:when>
<xsl:when test="@height_m and @width_m">
<xsl:text> width='</xsl:text>
<xsl:value-of select="@width_m"/>
<xsl:text>' height='</xsl:text>
<xsl:value-of select="@height_m"/>
<xsl:text>' </xsl:text>
</xsl:when>
</xsl:choose>
<xsl:text> src='</xsl:text>
<xsl:choose>
<xsl:when test="@url_s"><xsl:value-of select="@url_s"/></xsl:when>
<xsl:otherwise>http://farm<xsl:value-of select="@farm"/>.staticflickr.com/<xsl:value-of select="@server"/>/<xsl:value-of select="@id"/>_<xsl:value-of select="@secret"/>_s.jpg</xsl:otherwise>
</xsl:choose>
<xsl:text>' /></p></xsl:text>
</description>
</item>
</xsl:template>

</xsl:stylesheet>

To invoke this stylesheet and generate the RSS feed, I wrote the following small quick'n dirty cgi-script "flickr.cgi". The script was made executable, installed into my public_html/cgi-bin directory. It also requires to get an API key from flickr.

#!/bin/sh
echo "Content-Type: application/rss+xml"
echo 

APIKEY=12345678910
TAGS=`echo ${QUERY_STRING}|tr "?&" "\n" | egrep '^tags=' | cut -d '=' -f 2 | sed 's/,/%2C/g'`
TEXT=`echo ${QUERY_STRING}|tr "?&" "\n" | egrep '^text=' | cut -d '=' -f 2 | tr " " "+"`

curl -s "http://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=${APIKEY}&tags=${TAGS}&format=rest&extras=url_s,date_upload,date_taken,icon_server,owner_name&tag_mode=all&per_page=20&license=2,4,1,5,7&text=${TEXT}" |
xsltproc --novalid --stringparam title "${TAGS} ${TEXT}" flickr2rss.xsl -


I can know add some new RSS feeds into thunderbird and receive the new items: for example http://localhost/~me/cgi-bin/flickr.cgi?tags=science

<rss version="2.0" xmlns:date="http://exslt.org/dates-and-times" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" >
<channel>
<title>science </title>
<link>http://www.flickr.com</link>
<description>science </description>
<pubDate>2012-07-09T23:09:58+02:00</pubDate>
<lastBuildDate>2012-07-09T23:09:58+02:00</lastBuildDate>
<generator>http://www.flickr.com/</generator>
<item>
<title>2012 NOAA HABs Forecast : science </title>
<link>http://www.flickr.com/photos/41398337@N07/7537830696/</link>
<pubDate>2012-07-05 10:16:56</pubDate>
<author>Ohio Sea Grant and Stone Laboratory</author>
<guid isPermaLink="false">http://www.flickr.com/photos/41398337@N07/7537830696/</guid>
<description><p><img width='240' height='160' src='http://farm8.staticflickr.com/7252/7537830696_2a66783e70_m.jpg' /></p></description>
</item>
<item>
<title>2012 NOAA HABs Forecast : science </title>
<link>http://www.flickr.com/photos/41398337@N07/7537829638/</link>
<pubDate>2012-07-05 10:18:37</pubDate>
<author>Ohio Sea Grant and Stone Laboratory</author>
<guid isPermaLink="false">http://www.flickr.com/photos/41398337@N07/7537829638/</guid>
<description><p><img width='240' height='160' src='http://farm8.staticflickr.com/7252/7537829638_4c6dd535b4_m.jpg' /></p></description>
</item>
<item>
(...)
</item>
</channel>
</rss>


That's it,

Pierre





06 July 2012

The LZW compression algorithm as a measure of the short-reads complexity

The LZW algorithm, is a dictionary-based universal lossless data compression algorithm. The algorithm is easy to implement, here is a pseudocode (copied from there):

string s;
char ch;
...

s = empty string;
while (there is still data to be read)
{
    ch = read a character;
    if (dictionary contains s+ch)
    {
 s = s+ch;
    }
    else
    {
 encode s to output file;
 add s+ch to dictionary;
 s = ch;
    }
}
encode s to output file;
And here is my C++ implementation: the size of the dictionary reflects the complexity of the sequence: http://code.google.com/p/variationtoolkit/source/browse/trunk/src/lzw.h.

I've used this complexity to plot the number-of-reads=f(size-LZW);

Exome data

#complexity mapped unmapped sample
9 23274 21 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
10 1676 31379 CTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
11 2365 455 CCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
12 1523 5118 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGGAAAAAA
13 1941 2827 GTTCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
14 2253 2495 CCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCCCCCCCACCCCCCACCCACACC
15 2774 2908 ATAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGAG
16 3965 3149 AAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTCCCCCCCCCTCCT
17 6944 4020 CTTTTTTTTTTTCTTTTCTTTTTTTTTTCCCTCTTTTTTTTTTTTTTTTTTTTC
18 11607 5143 TTGGTTTTTTTTTTTTTTTTTTTTTTTGGTTTGTTTTTTTTTTTTTTTTACCCT
19 19659 6724 GGGGGGGGGGGGGGGGGGGGGAGGAGGAAGGGGAGGAAGGGAGGAGGAAAGAGA
20 32504 9412 ACCCTAACCCTACCCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAA
21 50824 13984 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC
22 77399 19651 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACTCTAACCCTAACCCTAACC
23 114774 28966 GATCTCCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTAACCCTAACC
24 176229 43729 TCCGATCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC
25 316878 67402 TTCCGATCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
26 721852 104378 TTCCGATCTGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTT
27 2028152 164968 CCGATCTAGGGTTAGGGTTAGGGTTGGGGTTAGGGTTAGGGTTAGGGTTAGGGT
28 6108817 284769 GCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCA
29 16907095 553236 GGGCACTGCAGGGCCCTCTTGCTTACTGTATAGTGGTGGCACGCCGCCCGCTGG
30 37103260 1130111 GTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGAT
31 55419720 1911772 GACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAG
32 47163550 2041580 GCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTT
33 17867328 1073476 CTGTATCCCACCAGCAATGTCTAGGAATGCCTGTTTCTCCACAAAGTGTTTACT
34 2014482 212089 TTTGCTGTCTCTTAGCCCAGACTTCCCGTGTCCTTTNNACCNGGCCTTTGAGAG
35 72637 30914 ACATCAANCTCAGGCACNTGGCCCAGGTCTGGCACTTAGAAGTAGTTCTCTGGG
36 8496 8247 AGGATATCTGGGNTGCNNCCGGAGTCGCAGTGTCTTGGGCCGCCTGAAGGTGAG
37 905 1506 AAGCATTACTGGAAACATCCTCATTGTGTTNTCTGNGACCANTNACCCTCACTN
38 58 102 TCGAGCNNCGTTGACTTCAGGNGGTCTNCTACCAGCAGCTCGNAATAGTTGCAC
39 1 0 AANTTCNAACGACTGTANNTCATNNGGCNNTGCNGGNCCNANAAACTGGCTGAG

Whole Genome data

#complexity mapped unmapped sample
13 1 2728 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
14 0 608 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGGGGGGGGGGG
15 0 2181 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAANAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
16 7 2095 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGGGGGGG
17 27 1924 AAAAAAAAAAAAAAAACTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAT
18 41 2558 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGAAAAAATAAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
19 66 2961 GGGGGGGGGGTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGTGTGGGCG
20 127 3391 AGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGCGGGGGGGGGAGGAGGGGGGGG
21 181 4244 NNNNNNNNNNNNNNNNNNNNNNNATTANNNNNNNNNNNNNNNNNNNTAANNNNNNNNNNNNNNNNNNNANNNNNNNNNNNANNNNNNNNNNNNNNNNNNN
22 371 5242 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAGAAGAAAAAAAAAAA
23 627 6308 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGGGGGCGCGCGGCCGGGGGCGCGGGT
24 1398 8204 GGTATAATGCTAGGTATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
25 3990 10923 GAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGA
26 8469 15085 CATCAGAATACAGCTAACAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAA
27 14494 20162 GCGGTGGCGGGGGCCCGCGGGCCCCCCGCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGGGGGGGGGGGCGG
28 24273 26976 TTCTTTCTTTCTTTCTTTCTTTCTTTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTCCTCCTTTTCTTTCCTTTTCTTTCT
29 37918 35975 ACCAACACCACACCCCCACCCCCCCCCCCCCCCCACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCAACCCCTAACCCTAACCCTAACC
30 58261 48523 CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
31 81164 62111 CTAACCCTAACCCTAACCCTAACCCTAACCCTCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
32 112886 79802 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCAACCCCAACCCTAACCCCAAC
33 154666 101551 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCGACCCCTAACCCGA
34 204241 130861 CCCTAGCCCCTCCCTATCCCTAACCCTAACCCTAACCCTAAAACCCTAACCCTAAAACCCTAACCCTAAAACCCTAACCCTAACCCTAACCCAACCCTAA
35 267951 166738 TCACCCTCACCCTCACCCTCACCCTAACCCTCACCCTCACCCTCACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACC
36 347051 210144 TAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCCAACCCTAACCCTAACCCTAACCCTAACCCCATCACTAACCTGTAACCCTCACCCTAACCCTA
37 447808 259191 CCCCACCCCCATCCCTAACCCGACCCTCAACCCAACCCCGAACCCAAACCCCAACCCCAACCCAAACCCAAACCCTAACCCTAACCCAAACCCTAACCCA
38 581941 320139 CTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTTACCCTTAACCCTCAACCCAACCCTAACACTAACCCTAACCCTAACCCCAAACCCAAGCCCA
39 770244 392189 CCCTACCCCTANCCCTACCCCTACCCCTAACCCTAACCCTAACCCTAACNCTAACCCAACCCCTCACACTACCCATCACCCCCACACCCTACCCCTACCC
40 1040306 485156 CCTTCACTCTACGCTTATCTCCCTACCTACCCCTAACCCTATCCCTAACCCTAACCCTATCCCTAACCCTAACCCTACCCCTAACCCTTACCCTAACCCA
41 1472994 592142 CAACCCGAGTACAATGGAAACGAATGGAATGGAATGAAATGGAATGGAATGGAATGGAATAGAATGGAATGGAATGGAATGGAATCAACCCGAGTGCAAT
42 2167116 711100 CCATCACCCCACCCCTACCCCTAACGCCACCCCTACCCCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCAGATCGGA
43 3307945 822818 CTTCCCCTACCCCCAACCCCGATCCCGAACCCAACCCCTAGCCCTACCCTTAACCCATCCCCATCCCTACCCCTAACCCTAACCCTAACCCTAAGCCAAC
44 5436093 924140 AGGGTTAGGGTAAGGGTTAGGGTTAGGGTTAGGGTTAGGGGTAGGGTTCGGGATTGGAAAGAGCGGCGGGTTGGGGGAGGGTTATGGGATCTGATGAAAT
45 10232300 1030747 CTAACCCTAACCCTAACCCTAACCTAACCCATCCCCCAGCCAACCTTTACCCTCACCCCTCCTCTGACCCTAACCCTCAACCTTCCCCTGCCTCGGAATC
46 21775389 1204827 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGCCGGCCCGCCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCC
47 47888153 1605181 CTAACCCTAACCCTAACCCTAACCCTAACCTAAACCTTATCGCTATGCTTACCAGTAGCCTGAACCTGACCAATACACTAACCCTCACCCGGAAAATAAA
48 98581306 2536484 TAACCCTAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGCCGGCCCGCCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTA
49 175000036 4206340 GTGGTTTTTGTCTGCCAGTTCATGGTAATCACAGTGATTTCAAGGGGGGGTAAAAAAAGGAGGTGTGAGAGGGGCCCCCGGTTTCCACACAGACACCACA
50 246244603 6207853 CCTAACCCTAACCCTACCCATATACCTAACCCTAAAATTAAAAGTAATCATAACCCTAACCTTAGTTCTGCAACTACGGCTACACACACGTGCAGACCTA
51 247156482 7048827 CCTCTGGTGGCCCTGTCCGGGCATGACAGAAGGCGCGCACCCTTGACTTCTGTTCACTTCTCACTATGTCCCCTCAGCCCCTATCTCTGAATGGCCTGGC
52 155610209 5305604 GCGGTACCCTCAGCCGGCCCGCCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCAACGAAATCTGTGCAGAGGACAACGCAGCT
53 53933711 2288949 AGCGTCGCAACTCAAATGCAGCATTCCTAATGCACACATGACACCCAAAATATAACAGACATATTACTCATGGAGGGGGAGGGTGAGTGTGAGGGTGAGG
54 9490136 496289 TTTCACCAGAAGTAGGCCTCTTCCTGACAGGCAGCTGCACCACTGCCGGGCGCTGTGCCCTACCTTTGCTCTGCCCGCTGGAGACGGGGTTTGTCATGGG
55 991089 67105 AATTTCTGGAATGGATTATTAAACAGAGAGTCTGTAAGCACTTAGAAAAGGCCGCGGTGAGCCCCAGGGGCCAGCACTGCTCGAAATGTACAGCATTTCT
56 105830 11519 GGAAAATTTCTGGAATGGATTATTACAGAGTCTGTAAGCACTTAGAAAAGGCCGCGGTGAGTCCCAGGGGCCAGCACTGCTCGAAATGTACAGCATTTCT
57 21081 3128 TNACNGANGNNTNNNGTNTATTGNTCCAANAATCGNAGANNGAGAGGTTAAANTNNNNNNCNNNGATTNTGGGTTGTCTATTGATGTTTTTGGTCTATTC
58 6263 964 ATCNAGAGGCCAAGCCCAGCCTGTCNGCTTTNGTGTATAAAGNTCTCATGGAACAGAGCTGTGAGCCTGCCGNNTGTNGTCNNNNNNTNCGCCTGGNNAN
59 1360 246 ATTNGCCGGATGTGGTGGTGGGCGCCTGTAGTCCCAACTACTCAGGAGGCTGAAGCAGGAGAATGGCNAGAACNCGNNAGATGGNNGNTGNNGTNAGCCG
60 198 26 TCGGTCAACAAAATGGGTGACAGAGACCTACGCACGGATTATAATNNANCNGGCNCCANCCCGAGTGNTNNNCGGGGATTGGATGGNNCANNNTCCATAG
61 8 3 TGGAAAATNACTAGCNNGGAAGCAGACTNCGGGCCANANANATANNCAGTCACTTTANGCCCNGNANGGTGGNTCACANCTGTNATCCTANGNCNTTGGN
62 1 0 AGTTACGTGCTTACAGAATACTTTNTTTTGAGGTCAATANNANNANTAAGTNANGNATNCNNGATATCCTAGNGGGAATTCTCCGNCCTTCTGGAAGCTG


I'm sure there's must be something to say about this, but I just don't have time :-)

That's it,

Pierre