Categories
Linked (Open) Data

Short review of the CSV ontology

This is interesting: the CSV ontology to describe the columns of a CSV file and the file itself. I can definitely see the value in rich descriptions of CSV files, or spreadsheets in general. But I’m also really tempted to ask “if you use RDF for the file ‘header’, why not the ‘body’ too?” There are various csv2rdf tools (although I haven’tused any), but TabLinker is the only one I know my colleagues are working on 🙂

Then I thought of file size: even Turtle files can easily grow larger than CSV files containing the same values. Moreover, support for CSV is more widely available, isn’t it?

The example use case (labeling and describing a file and its columns) also reminded me of ARFF, which embeds some metadata (comments on file level and field name & data type on column level) and allows sparse data, which could save bytes. But allowing only ASCII in the file makes the format pretty outdated. The XML-based XRFF allows the use of other encodings.

The CSV ontology itself needs a little revision, as some of the definitions are unclear (to me, at least), and the example CSV document contains spaces after commas (leading spaces are part of the field, according to RFC 4180). As an example of unclarity, the definition for the property mapsTo is Which RDF class values in the column map to — this may suggest the range of the the property is rdfs:Class, but the examples all have a property as object of mapsTo. When this correctly means the range is rdfs:Property, and if my understanding that you could create triples following the pattern <[subject]> <[column mapped property]> <[value in cell]> is correct, it is still unclear what the subject of the triple would be. There is no definition of a property that can be used to define a column as subject of the triple pattern. I guess it is not trivial to define.

Suddenly I’m reminded of Karma, which interactively, supported by machine learning, can create mappings for the columns of a CSV file to RDF. Wonder if its mappings can be mapped to the CSV ontology?

2 replies on “Short review of the CSV ontology”

Ben, thanks for the comments! You obviously get the point of the ontology.

I see CSV/Excel as static views of data generated from dynamic resources.

I know that much data is only available in spreadsheet format and that this is the dynamic resource from which all copies are propagated; one would like to see this data as RDF, but as you rightly point out, the tools to work with spreadsheets exist already, while RDF is more difficult for a layperson to get to grips with.

Nevertheless, what we know about spreadsheet management in real life means that there is space for a documentation forma 😀

I have revised the documentation according to what you have discovered, there are still a few issues that need to be fixed — I’m afraid I should have spent more time on the comments in the ontology! It’s a work in progress, so I’m very open to suggestions.

The csv:mapsTo property had two issues; the HTML was in error (I’m still creating this manually, I’m afraid) and I had mistakenly (albeit deliberately) not set the rdfs:range of this property because I originally wanted some openness in interpretation here. The domain for csv:mapsTo is csv:Column, which entails that the subject is always a column; what the subject actually should be is “every value from column x”, which is rather difficult to express without instantiating every cell/value. I think that this approach is acceptable in lieu of a better solution (one intuitively understands what is intended, even if the semantics aren’t great — something could be done in expressing the semantics a csv:CsvDocument/csv:Column).

Thanks again 🙂

With regard to the mapsTo property: my original comment may have been a bit unclear. If there is a column that has a mapsTo predicate, triples can be generated from the column and the values in that column serve as objects of the generated triples. But there is no specified property to select another column’s values as subjects of the generated triples. Or is there?

Comments are closed.