Terms of Semantics

From the “just throwing it out there” dept. in cooperation with the “I’m too lazy to do some research into existing efforts” dept.

It is generally known that people rarely read all terms of use and privacy statements of all involved parties providing a service. South Park used that knowledge as a storyline in The Human Cent-iPad.

One of the reasons for ignoring the Terms of Use (ToU) is their excessive length and use of legalese. They contain too much incomprehensible language. You need a lawyer to fully understand the rights and obligations that come with the service.

But many services share many characteristics, like the definition of a jurisdiction whose laws guide the service conditions, the definition of a service provider and consumer, definitions of content, ownership and other rights to the content.

Isn’t it possible to encode the characteristics of definitions in a standardised sets of terms?

If various (similar) services provide the definitions of their services in standardised terms, they could more easily be understood and compared. It would help non-human agents to select the best services for themselves and their human controllers.

More thought is needed.

Response to “Three reasons why the Semantic Web has failed”

Posted on http://gigaom.com/2013/11/03/three-reasons-why-the-semantic-web-has-failed/ as a comment (but at the time of posting it is still awaiting moderation).

I’d like to disagree with most of the article. Your argument “the Semantic Web has failed” does not follow from your “reasons”.
Sure, I’m pretty familiar with the Semantic Web and able to understand RDF (really, it’s not impossible to understand) and (most of) OWL, but that is not why I think a Synaptic Web can live next to a Semantic Web. To start: wouldn’t it be great for your streaming web interpreters to be presented with structured information next to unstructured text? Let it live on top of the Semantic Web (and the rest of the Web).

Do you want to exclude facts from knowledge? I, too, couldn’t care less about Leonardo da Vinci’s height, but if I see the Mona Lisa in Paris, I might want to know what else he painted and did and where I can see that. You need boring facts for that. Boring, but useful facts.
For human consumption “messages” are only part of knowledge. Take science for example. Science doesn’t only live in conversation; loads of scientific knowledge is transferred in documents.

The Semantic Web doesn’t depend on XML. Or JSON – although JSON-LD is gaining lots of ground. Human end users shouldn’t need to see raw facts in any text format, only developers. Turtle is the easiest to read and write by hand, I think, but eventually programmers will do that just as rarely as they read and write JSON.

We’re still a long way from having phones that measure brain activity to decipher our thoughts before they become pieces of knowledge consisting of concepts and, err, facts about things we do, want, and feel. In light of my privacy, I’d like my phone to not push my thoughts and activities to the Synaptic Web. It could ask specific questions to the Web that I would like answered, but those questions are likely to be based around concepts, time and place (“what museums are open around here tomorrow?”). That almost works and looks like keyword search.

I like the vision of a Synaptic Web (I heard the term for [the] first time today), but to call the Semantic Web failed because people actually want a Synaptic Web was not proven today.

My Linked Data publishing ‘platform’

Among the goals I had in mind for Companjen.name were to publish (parts of) my family tree so that others can benefit from it (without being bound to specific collaborative genealogy websites), and to play around with linked data (i.e. having a webspace to publish my own ‘minted’ URIs with data). I believe the second goal has been completed (and that the first can be achieved using the second).

Linked Data

Linked Data is based on using Uniform Resource Identifiers (URIs) for online and offline resources, that are dereferenceable via HTTP, so that at least useful information (i.e. metadata) about the resource is returned, if the resource itself cannot be returned. The machine-readable data format of choice is RDF, which should be serialized as RDF/XML (because all RDF parsers must be able to read that) and any other serialization I wish. For human agents it may be nice to have a data representation in HTML.

URI design

Because every URI is an identifier, we want to make sure they don’t break. I want the URIs I use to identify resources to be recognizable as such, and they need to be in my domain. Therefore I chose to have all URIs that may be used in my Linked Data to start with “http://companjen.name/id/”. (Resources can have many identifiers, so I can easily add another one to resources that already have URIs.)

What comes after the namespace prefix can take many forms; I haven’t decided yet. I do think it is nice to reserve filetype extensions for the associated data representations, i.e. “.html” for HTML, “.rdf” for RDF/XML and “.ttl” for Turtle documents.

How it works

My hosting provider allows me to use PHP, .htaccess files and MySQL, all of which I used to create the “platform”. It is composed of the PHP Content Negotiation library from ptlis.net, the PHP RDF library ARC2, two custom PHP scripts and a .htaccess file.

Since all URIs that I want to use have the same path “/id/”, but I don’t want to keep HTML, RDF/XML and Turtle files of every resource, I wrote some RewriteRules (helped by looking at Neil Crosby’s beginner’s guide) in the .htaccess file in the document root to redirect the request to a content negotiating PHP script. That script lets the Content Negotiation library determine the best content type based on the Accept header in the HTTP request and sends the user to the URI appended with .rdf, .ttl or .html via HTTP 303 See Other.

The HTTP client will then look up the new URI. Since the requested path will still contain “/id/”, mod_rewrite will catch the request, but another rule points to a PHP script that queries the ARC triplestore and puts it in the requested format (RDF/XML and Turtle are created by ARC itself, HTML is created by filling a template).

What you get when you look up something in the /id/ space, is the result of a simple “DESCRIBE <URI>” request to the triplestore, which is somewhat limited: it will only return triples with <URI> as subject. This gives some context (one of the principles of Linked Data), but it may be very interesting to know in what triples the resource is used as object or property (if applicable).

Future work

Apart from making the results more interesting by returning triples that have the URI in the property or object part, there is more to do to mature the platform.

First and foremost: fill the triplestore. There are things that I’d like to publish myself, instead of giving them away to commercial parties from whom I can only access them through controlled APIs. I already mentioned my family tree, but another example is concerts I visit. Let Last.fm, Songkick, Resident Advisor get that info from my triplestore, so that I only have to create the info once and keep control over it. Or maybe the concert venue will find my data on Sindice and display my review on the concert’s page. Oh, the possibilities of the Semantic Web…

As more data will become available in the triplestore, it makes sense to describe the different datasets using the Vocabulary of Interlinked Datasets (VoID) and put a link to the VoID document at the .well-known location. My family tree will be a nameable dataset, for example, with links to DBpedia, perhaps GeoNames and perhaps eventually online birth, marriage and death records.

The current HTML template is a table with columns Subject, Property and Object. A templating engine that has templates for different resource types would be a nice start, so that e.g. a person in my family tree will be displayed with a photo and birth and death dates like genealogy websites usually do (e.g. ” for marriage). Maybe there are browsers/editors for linked data family trees already, but looking for them is also future work.

Now to ‘mint’ a URI for myself: http://companjen.name/id/BC. Look it up if you like!