My data were primarily described as a XML file. It contains a description of the genome, of the tracks, the path to the fasta sequence etc... The FASTA sequence was provided by Dr Didier Poncet (CNRS/Gig). As far as I understand, it is not currently possible to specify that a track describes a protein.
<?xml version="1.0" encoding="UTF-8"?> <genomeHub > <name>Rotavirus</name> <shortLabel>Rotavirus</shortLabel> <longLabel>Rotavirus</longLabel> (...) <accessions id="set1"> <acn>GU144588</acn> <acn source="uniprot">Q0H8C5</acn> <acn source="uniprot">Q45UF6</acn> (..) <genome id="rf11"> <description>Rotavirus RF11</description> <organism>Rotavirus</organism> <defaultPos>RF01:1-10</defaultPos> <scientificName>Rotavirus</scientificName> <organism>Rotavirus</organism> <orderKey>10970</orderKey> <fasta>rotavirus/rf/rf.fa</fasta> (...) <group id="active_site"><accessions ref="set1"/><include>active site</include></group> <group id="calcium-binding_region"><accessions ref="set1"/><include>calcium-binding region</include></group> <group id="chain"><accessions ref="set1"/><include>chain</include></group> (...)This XML file is then processed with the following xsl stylsheet: https://github.com/lindenb/genomehub/blob/master/data/genomehub.xml : it generates a Makefile that will translate the fasta sequence to 2bit, create the bed files by aligning some annotated files to the reference with blast and convert them to bigbed.
At the end, my directory contains the following files:
./data/genomehub.xml ./data/genomehub2make.xsl ./data/sequence2fasta.xsl ./data/hub.txt ./data/genomes.txt ./data/rotavirus ./data/rotavirus/rf ./data/rotavirus/rf/signal_peptide.bed ./data/rotavirus/rf/CDS.bed ./data/rotavirus/rf/turn.bb ./data/rotavirus/rf/chrom.sizes ./data/rotavirus/rf/site.bed ./data/rotavirus/rf/coiled-coil_region.bed ./data/rotavirus/rf/mutagenesis_site.bb ./data/rotavirus/rf/UTR.bed ./data/rotavirus/rf/reference.fa~ ./data/rotavirus/rf/misc_feature.bed ./data/rotavirus/rf/CDS.bb ./data/rotavirus/rf/helix.bed ./data/rotavirus/rf/strand.bb ./data/rotavirus/rf/sequence_conflict.bb ./data/rotavirus/rf/modified_residue.bb ./data/rotavirus/rf/coiled-coil_region.bb ./data/rotavirus/rf/topological_domain.bb ./data/rotavirus/rf/active_site.bed ./data/rotavirus/rf/sequence_variant.bb ./data/rotavirus/rf/transmembrane_region.bb ./data/rotavirus/rf/zinc_finger_region.bed ./data/rotavirus/rf/region_of_interest.bb ./data/rotavirus/rf/glycosylation_site.bb ./data/rotavirus/rf/domain.bb ./data/rotavirus/rf/region_of_interest.bed ./data/rotavirus/rf/misc_feature.bb ./data/rotavirus/rf/topological_domain.bed ./data/rotavirus/rf/sequence_conflict.bed ./data/rotavirus/rf/UTR.bb ./data/rotavirus/rf/compositionally_biased_region.bed ./data/rotavirus/rf/chain.bed ./data/rotavirus/rf/glycosylation_site.bed ./data/rotavirus/rf/trackDb.txt ./data/rotavirus/rf/modified_residue.bed ./data/rotavirus/rf/disulfide_bond.bed ./data/rotavirus/rf/strand.bed ./data/rotavirus/rf/helix.bb ./data/rotavirus/rf/compositionally_biased_region.bb ./data/rotavirus/rf/transmembrane_region.bed ./data/rotavirus/rf/rf.fa ./data/rotavirus/rf/rf.2bit ./data/rotavirus/rf/splice_variant.bed ./data/rotavirus/rf/short_sequence_motif.bed ./data/rotavirus/rf/rf.fa.nsq ./data/rotavirus/rf/ALL.bed.blast.xml~ ./data/rotavirus/rf/gene.bed ./data/rotavirus/rf/sequence_variant.bed ./data/rotavirus/rf/disulfide_bond.bb ./data/rotavirus/rf/signal_peptide.bb ./data/rotavirus/rf/rf.fa.nin ./data/rotavirus/rf/short_sequence_motif.bb ./data/rotavirus/rf/turn.bed ./data/rotavirus/rf/domain.bed ./data/rotavirus/rf/mutagenesis_site.bed ./data/rotavirus/rf/zinc_finger_region.bb ./data/rotavirus/rf/chain.bb ./data/rotavirus/rf/rf.fa.nhr ./data/rotavirus/rf/splice_variant.bb ./data/rotavirus/rf/active_site.bb ./data/rotavirus/rf/site.bb ./data/rotavirus/rf/description.html ./README.md
The files required by the UCSC are then pushed on github and the URL pointing to hub.txt (https://raw.github.com/lindenb/genomehub/master/data/hub.txt) is registered at http://genome.ucsc.edu/cgi-bin/hgHubConnect. And a few clicks later...
That's it,
Pierre
No comments:
Post a Comment