g1kv37 vs hg19
In order to create a class to translate the chromosome names from one naming convention to another. I've compared the MD5 sums of the human genome versions g1k/v37 and ucsc/hg19. Here is the java program to create the MD5s:
The MD5 sums were extracted as follow:
Here are the common chromosomes, joined on the hash-sum:
And here are the unpairable data:
I knew the problem for chrY ( http://www.biostars.org/p/58143/) but not for chr3.. What is the problem for this chromosome ?
Edit: Here are the number of bases for UCSC/chr3:
{T=58760485, G=38670110, A=58713343, C=38653197, N=3225295}and for g1kv37:
{T=58760485, G=38670110, A=58713343, R=2, C=38653197, M=1, N=3225292}
That's it,
Pierre.
2 comments:
Hello Pierre, about chr3 you help me to find an answer 2 years ago ;-)
http://www.biostars.org/p/9464/
My last comment in this page show a possible answer.
@pablo Haha :-) I didn't remember that post :-)
Post a Comment