Linked (Open) Data

Can you please remove ‘meaningful punctuation’ from field contents, librarians?

Dear Cataloguing librarians,

It is time to realise that using punctuation as a way of marking sub-field boundaries is bad practice. You must not want to put title and “responsible entity” in one field and then try to split the field contents using punctuation like ” / “. You must not want to use an author’s full name in reverse order + year of birth + year of death (if applicable) to identify the person – certainly if you also allow an optional “.”.

You need to understand: the machines are not smart enough yet to understand your cataloguing rules and therefore they don’t get the meaning of what you put in the fields. Even the ones at OCLC are not smart enough yet.

What drove me to write this, was this example: Linked Data about Works published by OCLC. It is buzzing and – I agree with Ed Summers – pretty cool. The data structure and semantics can be improved, as Richard Wallis of OCLC says in a blog post. The example that Ed took in his blog post, Weaving the Web” by Tim Berners-Lee, demonstrates my issue (which is not touched upon by Ed or Richard).

The work’s title is shown as:

Weaving the Web : the original design and ultimate destiny of the World Wide Web by its inventor /

Yes, Tim himself said he would have gotten rid of the two forward slashes after http: in URIs, had he had the chance to start over, but the slash at the end of the title was not Tim’s intent. I bet you put “Tim Berners-Lee” or even “by Tim Berners-Lee” after that slash in the 245 field of your MARC record.

Second point, from the same example, the authors. And contributors, and creators. I know the temporary URIs will be replaced by VIAF URIs, but OCLC will still need to map…

"creator" : [ "", "" ]

… to the one and only Tim Berners-Lee (who co-authored the book). In this example that should be easy, as there aren’t many people called Tim Berners-Lee on the planet and there is only one with a very strong connection to “the Web”, but the general case is not that simple. (You need context for that, and even then there is a chance that you make incorrect matches. You may find some context for this in my thesis.)

I’ll come back to you in some time to see how you’re getting on with fixing all of this. I’m counting on you!



2 replies on “Can you please remove ‘meaningful punctuation’ from field contents, librarians?”

Thanks for this post pointing out this issue with thinking about catalog records as some kind of static entity versus a record as chunks of (hopefully) useable data. Punctuation like this also plays havoc in mapping and normalization projects I work on, primarily because of a lack of consistency in punctuation used or the lack of a reliable delimiter. I wrote about a related topic — thinking about how you are going to use your data to help form your standards for record creation — from an archives perspective last year Toward a Less Precious Cataloging.

That is a very nice post, thanks for sharing. Of course I wasn’t the first to point this out. And of course it has never been just about libraries. I work on data integration with archives’ data too, so I know it’s not trivial. (I just felt that if even OCLC shows the problem so prominently, it needs more attention.)

From what I’ve seen from one hardcore old-skool cataloguer on the BIBFRAME mailing list, not everyone already understands how computers (do not) see the meanings in text that we see in records.

(I’m not arguing that the concept of a descriptive record will change or need to go away, by the way.)

Comments are closed.