Short review of “Spamming in Scholarly Publishing: A Case Study”

Interesting: a researcher, Marcin Kozak, gets a lot of unsollicited email (spam) trying to convince him to publish in a journal or with a publisher and decides to check out these journals and publishers.

Kozak, M., Iefremova, O. and Hartley, J. (2015), Spamming in scholarly publishing: A case study. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23521

The abstract covers it well:

Spam has become an issue of concern in almost all areas where the Internet is involved, and many people today have become victims of spam from publishers and individual journals. We studied this phenomenon in the field of scholarly publishing from the perspective of a single author. We examined 1,024 such spam e-mails received by Marcin Kozak from publishers and journals over a period of 391 days, asking him to submit an article to their journal. We collected the following information: where the request came from; publishing model applied; fees charged; inclusion or not in the Directory of Open Access Journals (DOAJ); and presence or not in Beall’s (2014) listing of dubious journals. Our research showed that most of the publishers that sent e-mails inviting manuscripts were (i) using the open access model, (ii) using article-processing charges to fund their journal’s operations; (iii) offering very short peer-review times, (iv) on Beall’s list, and (v) misrepresenting the location of their headquarters. Some years ago, a letter of invitation to submit an article to a particular journal was considered a kind of distinction. Today, e-mails inviting submissions are generally spam, something that misleads young researchers and irritates experienced ones.

Some details were missing, however. I think good methodologies for assessing a publisher’s or journal’s trustworthiness are necessary, so it would be great if people researching these methodologies get the details correct.

The location of the headquarters was determined via various means, one of these being a lookup of the domain name holder’s (or registrant’s) country in a WHOIS system. The authors conclude this is not a reliable method, but do not explain why. A few sentences before they do suggest that the registrant’s country is the country the publisher/journal is based in, or that WHOIS shows the location of the server. Exactly what information was used from WHOIS is not described.

Another way of determining the headquarters’ location was to look up the information on the website. How to determine that information is found or missing is not mentioned.

One of the conclusions is that “the average time claimed for peer review was 4 weeks or less.” I don’t see how this follows from the summary table of claimed time for peer review, because it contains N/A values, and nearly all claimed times are 4 weeks or less. The form of the statement is wrong.

Finally, I would have liked to see a reason for not including the dataset. I can only guess why the authors deliberately did not provide the names of journals and publishers.

I think the conclusions hold (except for the one mentioned above), and that work should be performed to improve the methodology for judging journal quality. Eventually, the work would be automated and be easily replicated over time. Results from such automated checks could be added to the DOAJ.

Terms of Semantics

From the “just throwing it out there” dept. in cooperation with the “I’m too lazy to do some research into existing efforts” dept.

It is generally known that people rarely read all terms of use and privacy statements of all involved parties providing a service. South Park used that knowledge as a storyline in The Human Cent-iPad.

One of the reasons for ignoring the Terms of Use (ToU) is their excessive length and use of legalese. They contain too much incomprehensible language. You need a lawyer to fully understand the rights and obligations that come with the service.

But many services share many characteristics, like the definition of a jurisdiction whose laws guide the service conditions, the definition of a service provider and consumer, definitions of content, ownership and other rights to the content.

Isn’t it possible to encode the characteristics of definitions in a standardised sets of terms?

If various (similar) services provide the definitions of their services in standardised terms, they could more easily be understood and compared. It would help non-human agents to select the best services for themselves and their human controllers.

More thought is needed.

Email wish list

Things I might find interesting in an email client or environment. Raw thoughts.

  • fast search
  • search inside encrypted emails
  • get related and linked stuff easily at any time during reading
  • easily organise windows and content, e.g.
    • when you select to reply to an email, put the draft next to the original so that you don’t need to switch windows
    • use a consistent style when commenting inline
  • easily create tasks and notes from parts of emails
  • organise / order following user’s workflow
    • by people, topic, tone of content, project, task, event
    • integrate with business process(es)
  • suggest while typing
    • spelling and grammar
    • references to other emails / notes / events / conversations
    • people to include as recipient
    • named entities
    • expansions of abbreviations
    • different tone of text by using templates
    • replacement text
  • find and use embedded machine-actionable information
  • explain what the client did to your experience
    • log observations and following actions by the mail user agent
    • show the reasons for suggesting stuff

New hobby: “You’re vulnerable as in CVE-2009-3555”

I started a new hobby: pointing out vulnerabilities for a particular man-in-the-middle attack. It is described in CVE-2009-3555:

The TLS protocol, and the SSL protocol 3.0 and possibly earlier, as used in Microsoft Internet Information Services (IIS) 7.0, mod_ssl in the Apache HTTP Server 2.2.14 and earlier, OpenSSL before 0.9.8l, GnuTLS 2.8.5 and earlier, Mozilla Network Security Services (NSS) 3.12.4 and earlier, multiple Cisco products, and other products, does not properly associate renegotiation handshakes with an existing connection, which allows man-in-the-middle attackers to insert data into HTTPS sessions, and possibly other types of sessions protected by TLS or SSL, by sending an unauthenticated request that is processed retroactively by a server in a post-renegotiation context, related to a “plaintext injection” attack, aka the “Project Mogul” issue.

I bet I am not the only one practioning this activity, because the vulnerability was described in 2009. And the fix for this vulnerability has been around for quite some years too. After the releases of the various fixes for this vulnerability the quest for KAMITMA (Knights Against Man-In-The-Middle Attacks) began. It is easy to become a KAMITMA in the context of CVE-2009-3555. You just need to point out this vulnerability to the owner of TLS/SSL protected websites that are vulnerable, politely asking that they update their software. You might mention that this is an old vulnerability (2009 is like a century in computer development).

I started this hobby because I found my own website (and ownCloud and email) is vulnerable. I asked my hosting provider and they responded by offering moving to a ‘new’ hosting environment. New is not the newest version of Apache httpd, but I am planning to move to this new hosting environment soon. If that doesn’t fix this vulnerability, I’ll have to move and suggest others to do so too because it’d show my hosting provider doesn’t care about security.

The easiest way of spotting this vulnerability is to use Firefox and make it block connections that are made using the old vulnerable protocol. Open a tab in Firefox, enter about:config and press enter. Search for security.ssl.require_safe_negotiation and double-click the row to set it to true. The next time you try to visit a vulnerable website you’ll see this:


When you see this, it’s hobby time: send a message to the owner of the website asking to get updated and more secure.

I found these websites and try to keep this list updated when I find something changes:

Addition, 2014-08-28:, a news website and price comparison website for electronics, hosts its images on a different domain ( and subdomain Apparently, these are vulnerable because all color and layout is lost when I look at the Pricewatch. The browser console shows what is going on:


Uitgevers van closed-accesstijdschriften moeten niet zeuren

Uitgevers die bang zijn voor een lagere Impact Factor moeten hand in eigen boezem steken. Zeker uitgevers van wetenschappelijke tijdschriften die niet Open Access zijn, want zij helpen de afhankelijkheid van Thomson Reuters en de Journal Citation Reports® in de hand.

Thomson Reuters legt op hun website uit hoe de simpele formule achter de IF in elkaar steekt. Als je de getallen hebt, is het berekenen van de IF een invuloefening. Met de hedendaagse computerkracht is het mogelijk om op elk moment de scores van alle tijdschriften in seconden uit te rekenen. Dus is het eigenlijk onzin dat we met zijn allen zitten te wachten op De Lijst die in juni uitkomt. Maar het zijn die noodzakelijke getallen die niet zo makkelijk te verkrijgen zijn en uitgevers hebben daar zelf invloed op.

Voor citatiecijfers heb je betrouwbare referentielijsten van alle artikelen in alle tijdschriften nodig. Dat gaat al niet makkelijk; het is niet alsof je op de website van elke uitgever zomaar een lijst vindt die steeds actueel blijft. Misschien wordt de referentielijst gezien als deel van het (overgenomen) intellectueel eigendom en is men bang voor waardevermindering als de lijst openbaar is. Misschien is er geen betrouwbare lijst, omdat deze wordt gebaseerd op patroonherkenning in de tekst en redacties of uitgevers(!) juist díe citatiestijl (uit duizenden) voorschrijven die niet goed wordt herkend door de software. En dan ben je nog afhankelijk van de auteurs die de referentielijst aanleveren in het manuscript – hopelijk gebruiken de auteurs hiervoor tenminste literatuurbeheersoftware om de kans op spel- en typefouten te verkleinen.

Iedereen zou van een tijdschrift met open referentielijsten snel kunnen inzien hoevaak artikelen in het tijdschrift zelf-citaties krijgen (en dat met andere tijdschriften vergelijken, of tussen auteurs vergelijken). Ook wordt een eventuele niche snel duidelijk: de tijdschriften in een niche zullen relatief vaak naar elkaar verwijzen. Voor lezers en schrijvers ontstaat zo ook een snel overzicht van de markt in het onderwerp.

Thomson Reuters doet nog wel enige kwaliteitscontrole van een tijdschrift voordat in de lijst wordt opgenomen. Daarom kan het gebeuren dat een tijdschrift dat jaren in de JCR voorkwam opeens is verdwenen. Thomson Reuters heeft de data om die beslissing te ondersteunen, maar geeft geen antwoord op vragen over die beslissing. En dat is misschien nog wel het gekste van de hele discussie rond de Impact Factor. De wetenschap drijft op onderzoek dat repliceerbaar moet zijn, ondersteunende onderzoeksdata wordt daarom steeds vaker gedeeld en we weten dat de IF niet zaligmakend is. Maar toch vertrouwen we graag op zo’n lastig te controleren getal.