17 December 2008

Putting semantics in the spreadsheets.

Just a few ideas

I've been recently asked to find a way to store a set of heterogeneous files ( pedigrees, linkage, results of unix pipelines.... ). My first idea was to upload the file in a wiki and to append some well choosen categories to then easily retrieve the file later. I also imagined to use a Template to create a form where the user would add some semi-structured annotations (see my test on openwetware.org here ).

But, of course, the users want always more. There must be a Murphy's law for this....

Now I should create a robot that could find any file of a given type (say a linkage file) containing a given information (say a snp defined by its rs-id). So I've started to create a set of two RDFS-based ontologies that could be used to describe what is this file about (e.g. File -> Plain Text -> Tab-Delimited -> Pedigree) , and what are the columns about (e.g. xsd:string -> biological entity -> genetic marker -> snp -> rs-id ). A robot would then be able to identify and parse the files and , for example, would find the columns containing "SNP" or "Microsattelite" if I ask for the columns containing a 'Genetic Marker' "
The two drafts are available here:

  • http://code.google.com/p/fileontology/source/browse/trunk/files/ont/columns.rdf
  • http://code.google.com/p/fileontology/source/browse/trunk/files/ont/files.rdf
  • .

    I don't know if this idea has already been implemented elsewhere. Nevertheless Frank Gibson suggested me to have a look at Information-artifact-ontology: The Information Artifact Ontology (IAO) is a new ontology of information entities, originally driven by work by the OBI digital entity and realizable information entity branch.. Lots of information here...

    I'm still exploring this subject.


