This is interesting: the CSV ontology to describe the columns of a CSV file and the file itself. I can definitely see the value in rich descriptions of CSV files, or spreadsheets in general. But I’m also really tempted to ask “if you use RDF for the file ‘header’, why not the ‘body’ too?” There are various csv2rdf tools (although I haven’tused any), but TabLinker is the only one I know my colleagues are working on 🙂
Then I thought of file size: even Turtle files can easily grow larger than CSV files containing the same values. Moreover, support for CSV is more widely available, isn’t it?
The example use case (labeling and describing a file and its columns) also reminded me of ARFF, which embeds some metadata (comments on file level and field name & data type on column level) and allows sparse data, which could save bytes. But allowing only ASCII in the file makes the format pretty outdated. The XML-based XRFF allows the use of other encodings.
The CSV ontology itself needs a little revision, as some of the definitions are unclear (to me, at least), and the example CSV document contains spaces after commas (leading spaces are part of the field, according to RFC 4180). As an example of unclarity, the definition for the property
mapsTo is Which RDF class values in the column map to — this may suggest the range of the the property is
rdfs:Class, but the examples all have a property as object of
mapsTo. When this correctly means the range is
rdfs:Property, and if my understanding that you could create triples following the pattern
<[subject]> <[column mapped property]> <[value in cell]> is correct, it is still unclear what the subject of the triple would be. There is no definition of a property that can be used to define a column as subject of the triple pattern. I guess it is not trivial to define.
Suddenly I’m reminded of Karma, which interactively, supported by machine learning, can create mappings for the columns of a CSV file to RDF. Wonder if its mappings can be mapped to the CSV ontology?