Terms of Semantics

From the “just throwing it out there” dept. in cooperation with the “I’m too lazy to do some research into existing efforts” dept. It is generally known that people rarely read all terms of use and privacy statements of all involved parties providing a service. South Park used that knowledge as a storyline in The Human Cent-iPad. One of the reasons for ignoring the Terms of Use (ToU) is their excessive length and use of legalese. They contain too much incomprehensible language. You need a lawyer to fully understand the rights and obligations that come with the service. But many services share many characteristics, like the definition of a jurisdiction whose laws guide the service conditions, the definition of a service provider and consumer, definitions of content, ownership and other rights to the content. Isn’t it possible to encode the characteristics of definitions in a standardised sets of terms? If various (similar) services provide the definitions of their services in standardised terms, they could more easily be understood and compared. It would help non-human agents to select the best services for themselves and their human controllers. More thought is needed.

Email wish list

Things I might find interesting in an email client or environment. Raw thoughts.

  • fast search
  • search inside encrypted emails
  • get related and linked stuff easily at any time during reading
  • easily organise windows and content, e.g.
    • when you select to reply to an email, put the draft next to the original so that you don’t need to switch windows
    • use a consistent style when commenting inline
  • easily create tasks and notes from parts of emails
  • organise / order following user’s workflow
    • by people, topic, tone of content, project, task, event
    • integrate with business process(es)
  • suggest while typing
    • spelling and grammar
    • references to other emails / notes / events / conversations
    • people to include as recipient
    • named entities
    • expansions of abbreviations
    • different tone of text by using templates
    • replacement text
  • find and use embedded machine-actionable information
  • explain what the client did to your experience
    • log observations and following actions by the mail user agent
    • show the reasons for suggesting stuff

New hobby: “You’re vulnerable as in CVE-2009-3555”

I started a new hobby: pointing out vulnerabilities for a particular man-in-the-middle attack. It is described in CVE-2009-3555:

The TLS protocol, and the SSL protocol 3.0 and possibly earlier, as used in Microsoft Internet Information Services (IIS) 7.0, mod_ssl in the Apache HTTP Server 2.2.14 and earlier, OpenSSL before 0.9.8l, GnuTLS 2.8.5 and earlier, Mozilla Network Security Services (NSS) 3.12.4 and earlier, multiple Cisco products, and other products, does not properly associate renegotiation handshakes with an existing connection, which allows man-in-the-middle attackers to insert data into HTTPS sessions, and possibly other types of sessions protected by TLS or SSL, by sending an unauthenticated request that is processed retroactively by a server in a post-renegotiation context, related to a “plaintext injection” attack, aka the “Project Mogul” issue.

I bet I am not the only one practioning this activity, because the vulnerability was described in 2009. And the fix for this vulnerability has been around for quite some years too. After the releases of the various fixes for this vulnerability the quest for KAMITMA (Knights Against Man-In-The-Middle Attacks) began. It is easy to become a KAMITMA in the context of CVE-2009-3555. You just need to point out this vulnerability to the owner of TLS/SSL protected websites that are vulnerable, politely asking that they update their software. You might mention that this is an old vulnerability (2009 is like a century in computer development).

I started this hobby because I found my own website (and ownCloud and email) is vulnerable. I asked my hosting provider and they responded by offering moving to a ‘new’ hosting environment. New is not the newest version of Apache httpd, but I am planning to move to this new hosting environment soon. If that doesn’t fix this vulnerability, I’ll have to move and suggest others to do so too because it’d show my hosting provider doesn’t care about security.

The easiest way of spotting this vulnerability is to use Firefox and make it block connections that are made using the old vulnerable protocol. Open a tab in Firefox, enter about:config and press enter. Search for security.ssl.require_safe_negotiation and double-click the row to set it to true. The next time you try to visit a vulnerable website you’ll see this:


When you see this, it’s hobby time: send a message to the owner of the website asking to get updated and more secure.

I found these websites and try to keep this list updated when I find something changes:

Addition, 2014-08-28: Tweakers.net, a news website and price comparison website for electronics, hosts its images on a different domain (tweakimg.net and subdomain ic.tweakimg.net). Apparently, these are vulnerable because all color and layout is lost when I look at the Pricewatch. The browser console shows what is going on:


Claim your prize!

Dear friend

This is a personal email directed to you. I’m Ben, my friend Ivar and I got a prize of great value, and have voluntarily decided to donate a large portion of great value to you as a part of our own charity project to improve the happiness of 60 unknown people plus 25 good friends. If you have received this email then you are one of the lucky recipients!

This is not spam! To prove this is not spam, you need to pay € 10 (= € 10) and come to the Zuiderpark in The Hague four times on Tuesday evenings in May. But that’s not all, if you forward this message to your friends, colleagues or family and they pay 10 EUR and come into the Zuiderpark, they will also receive a portion of the price of great value.
In order to bring enough value to the Zuiderpark you should sign up yourself and your friends at http://cityleague.nl/en/node/35.


Ben and Ivar

Can you please remove ‘meaningful punctuation’ from field contents, librarians?

Dear Cataloguing librarians,

It is time to realise that using punctuation as a way of marking sub-field boundaries is bad practice. You must not want to put title and “responsible entity” in one field and then try to split the field contents using punctuation like ” / “. You must not want to use an author’s full name in reverse order + year of birth + year of death (if applicable) to identify the person – certainly if you also allow an optional “.”.

You need to understand: the machines are not smart enough yet to understand your cataloguing rules and therefore they don’t get the meaning of what you put in the fields. Even the ones at OCLC are not smart enough yet.

What drove me to write this, was this example: Linked Data about Works published by OCLC. It is buzzing and – I agree with Ed Summers – pretty cool. The data structure and semantics can be improved, as Richard Wallis of OCLC says in a blog post. The example that Ed took in his blog post, Weaving the Web” by Tim Berners-Lee, demonstrates my issue (which is not touched upon by Ed or Richard).

The work’s title is shown as:

Weaving the Web : the original design and ultimate destiny of the World Wide Web by its inventor /

Yes, Tim himself said he would have gotten rid of the two forward slashes after http: in URIs, had he had the chance to start over, but the slash at the end of the title was not Tim’s intent. I bet you put “Tim Berners-Lee” or even “by Tim Berners-Lee” after that slash in the 245 field of your MARC record.

Second point, from the same example, the authors. And contributors, and creators. I know the temporary URIs will be replaced by VIAF URIs, but OCLC will still need to map…

"creator" : [ "http://experiment.worldcat.org/entity/work/data/27331745#Person/berners_lee_tim", "http://experiment.worldcat.org/entity/work/data/27331745#Person/berners_lee_tim_1955" ]

… to the one and only Tim Berners-Lee (who co-authored the book). In this example that should be easy, as there aren’t many people called Tim Berners-Lee on the planet and there is only one with a very strong connection to “the Web”, but the general case is not that simple. (You need context for that, and even then there is a chance that you make incorrect matches. You may find some context for this in my thesis.)

I’ll come back to you in some time to see how you’re getting on with fixing all of this. I’m counting on you!



The pain of plain (text emails)

Since the introduction of HTML email (must have been before my first email encounter), and especially now that internet speeds have gone up and computers are fast enough to render email in HTML, email has been HTML.

Since I switched to only looking at the plain text version of multipart/alternative emails, I see that HTML has become the default for many corporate emails.

I see HTML entities: © and   for which there are Unicode code points and UTF-8 representations.

I see:

Click here for more.

… without any URL that my email client could make clickable.

I see the opposite:

More URL than text
More URL than text

I see CSS, with hacks:

/* Client-specific Styles */
#outlook a{padding:0;} /* Force Outlook to provide a "view in browser" button. */
body{width:100% !important;} .ReadMsgBody{width:100%;} .ExternalClass{width:100%;} /* Force Hotmail to display emails at full width */
body{-webkit-text-size-adjust:none;} /* Prevent Webkit platforms from changing default text sizes. */
/* Reset Styles */
body{margin:0; padding:0;}
img{border:0; height:auto; line-height:100%; outline:none; text-decoration:none;}
table td{border-collapse:collapse;}
#backgroundTable{height:100% !important; margin:0; padding:0; width:100% !important;}
p {margin-top: 14px; margin-bottom: 14px;}
/* ////////// STANDARD STYLING: PREHEADER ////////// */
.preheaderContent div a:link, .preheaderContent div a:visited, /* Yahoo! Mail Override */ .preheaderContent div a .yshortcuts /* Yahoo! Mail Override */{
color: #3b6e8f;
.mainContent a:link, a:visited{
/* ////////// STANDARD STYLING: FOOTER LINKS ////////// */
.footerContent div a:link, .footerContent div a:visited, /* Yahoo! Mail Override */ .footerContent div a .yshortcuts /* Yahoo! Mail Override */{
/*@editable*/ color:#336699;
/*@editable*/ font-weight:normal;
/*@editable*/ text-decoration:underline;

And I see HTML:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<head><title>Uw bestelling bij Ticketscript</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<table width="468">Beste Ben Companjen,

Come on, corporations, I am in business with you (these emails are results of me being a client of corporations’ services)! Please put some effort in serving your plain text email readers.

Response to “Three reasons why the Semantic Web has failed”

Posted on http://gigaom.com/2013/11/03/three-reasons-why-the-semantic-web-has-failed/ as a comment (but at the time of posting it is still awaiting moderation).

I’d like to disagree with most of the article. Your argument “the Semantic Web has failed” does not follow from your “reasons”.
Sure, I’m pretty familiar with the Semantic Web and able to understand RDF (really, it’s not impossible to understand) and (most of) OWL, but that is not why I think a Synaptic Web can live next to a Semantic Web. To start: wouldn’t it be great for your streaming web interpreters to be presented with structured information next to unstructured text? Let it live on top of the Semantic Web (and the rest of the Web).

Do you want to exclude facts from knowledge? I, too, couldn’t care less about Leonardo da Vinci’s height, but if I see the Mona Lisa in Paris, I might want to know what else he painted and did and where I can see that. You need boring facts for that. Boring, but useful facts.
For human consumption “messages” are only part of knowledge. Take science for example. Science doesn’t only live in conversation; loads of scientific knowledge is transferred in documents.

The Semantic Web doesn’t depend on XML. Or JSON – although JSON-LD is gaining lots of ground. Human end users shouldn’t need to see raw facts in any text format, only developers. Turtle is the easiest to read and write by hand, I think, but eventually programmers will do that just as rarely as they read and write JSON.

We’re still a long way from having phones that measure brain activity to decipher our thoughts before they become pieces of knowledge consisting of concepts and, err, facts about things we do, want, and feel. In light of my privacy, I’d like my phone to not push my thoughts and activities to the Synaptic Web. It could ask specific questions to the Web that I would like answered, but those questions are likely to be based around concepts, time and place (“what museums are open around here tomorrow?”). That almost works and looks like keyword search.

I like the vision of a Synaptic Web (I heard the term for [the] first time today), but to call the Semantic Web failed because people actually want a Synaptic Web was not proven today.