17 December 2008

Putting semantics in the spreadsheets.

Just a few ideas

I've been recently asked to find a way to store a set of heterogeneous files ( pedigrees, linkage, results of unix pipelines.... ). My first idea was to upload the file in a wiki and to append some well choosen categories to then easily retrieve the file later. I also imagined to use a Template to create a form where the user would add some semi-structured annotations (see my test on openwetware.org here ).

But, of course, the users want always more. There must be a Murphy's law for this....

Now I should create a robot that could find any file of a given type (say a linkage file) containing a given information (say a snp defined by its rs-id). So I've started to create a set of two RDFS-based ontologies that could be used to describe what is this file about (e.g. File -> Plain Text -> Tab-Delimited -> Pedigree) , and what are the columns about (e.g. xsd:string -> biological entity -> genetic marker -> snp -> rs-id ). A robot would then be able to identify and parse the files and , for example, would find the columns containing "SNP" or "Microsattelite" if I ask for the columns containing a 'Genetic Marker' "
The two drafts are available here:

  • http://code.google.com/p/fileontology/source/browse/trunk/files/ont/columns.rdf
  • http://code.google.com/p/fileontology/source/browse/trunk/files/ont/files.rdf
  • .

    I don't know if this idea has already been implemented elsewhere. Nevertheless Frank Gibson suggested me to have a look at Information-artifact-ontology: The Information Artifact Ontology (IAO) is a new ontology of information entities, originally driven by work by the OBI digital entity and realizable information entity branch.. Lots of information here...

    I'm still exploring this subject.


    Pierre

    No comments: