I’m about to complete my 10000th task in Todoist, yet I’m still over 8000 points of karma away from the Enlightened state…
Interesting: a researcher, Marcin Kozak, gets a lot of unsollicited email (spam) trying to convince him to publish in a journal or with a publisher and decides to check out these journals and publishers.
Kozak, M., Iefremova, O. and Hartley, J. (2015), Spamming in scholarly publishing: A case study. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23521
The abstract covers it well:
Spam has become an issue of concern in almost all areas where the Internet is involved, and many people today have become victims of spam from publishers and individual journals. We studied this phenomenon in the field of scholarly publishing from the perspective of a single author. We examined 1,024 such spam e-mails received by Marcin Kozak from publishers and journals over a period of 391 days, asking him to submit an article to their journal. We collected the following information: where the request came from; publishing model applied; fees charged; inclusion or not in the Directory of Open Access Journals (DOAJ); and presence or not in Beall’s (2014) listing of dubious journals. Our research showed that most of the publishers that sent e-mails inviting manuscripts were (i) using the open access model, (ii) using article-processing charges to fund their journal’s operations; (iii) offering very short peer-review times, (iv) on Beall’s list, and (v) misrepresenting the location of their headquarters. Some years ago, a letter of invitation to submit an article to a particular journal was considered a kind of distinction. Today, e-mails inviting submissions are generally spam, something that misleads young researchers and irritates experienced ones.
Some details were missing, however. I think good methodologies for assessing a publisher’s or journal’s trustworthiness are necessary, so it would be great if people researching these methodologies get the details correct.
The location of the headquarters was determined via various means, one of these being a lookup of the domain name holder’s (or registrant’s) country in a WHOIS system. The authors conclude this is not a reliable method, but do not explain why. A few sentences before they do suggest that the registrant’s country is the country the publisher/journal is based in, or that WHOIS shows the location of the server. Exactly what information was used from WHOIS is not described.
Another way of determining the headquarters’ location was to look up the information on the website. How to determine that information is found or missing is not mentioned.
One of the conclusions is that “the average time claimed for peer review was 4 weeks or less.” I don’t see how this follows from the summary table of claimed time for peer review, because it contains N/A values, and nearly all claimed times are 4 weeks or less. The form of the statement is wrong.
Finally, I would have liked to see a reason for not including the dataset. I can only guess why the authors deliberately did not provide the names of journals and publishers.
I think the conclusions hold (except for the one mentioned above), and that work should be performed to improve the methodology for judging journal quality. Eventually, the work would be automated and be easily replicated over time. Results from such automated checks could be added to the DOAJ.
From the “just throwing it out there” dept. in cooperation with the “I’m too lazy to do some research into existing efforts” dept.
But many services share many characteristics, like the definition of a jurisdiction whose laws guide the service conditions, the definition of a service provider and consumer, definitions of content, ownership and other rights to the content.
Isn’t it possible to encode the characteristics of definitions in a standardised sets of terms?
If various (similar) services provide the definitions of their services in standardised terms, they could more easily be understood and compared. It would help non-human agents to select the best services for themselves and their human controllers.
More thought is needed.
Things I might find interesting in an email client or environment. Raw thoughts.
- fast search
- search inside encrypted emails
- get related and linked stuff easily at any time during reading
- easily organise windows and content, e.g.
- when you select to reply to an email, put the draft next to the original so that you don’t need to switch windows
- use a consistent style when commenting inline
- easily create tasks and notes from parts of emails
- organise / order following user’s workflow
- by people, topic, tone of content, project, task, event
- integrate with business process(es)
- suggest while typing
- spelling and grammar
- references to other emails / notes / events / conversations
- people to include as recipient
- named entities
- expansions of abbreviations
- different tone of text by using templates
- replacement text
- find and use embedded machine-actionable information
- explain what the client did to your experience
- log observations and following actions by the mail user agent
- show the reasons for suggesting stuff
I started a new hobby: pointing out vulnerabilities for a particular man-in-the-middle attack. It is described in CVE-2009-3555:
The TLS protocol, and the SSL protocol 3.0 and possibly earlier, as used in Microsoft Internet Information Services (IIS) 7.0, mod_ssl in the Apache HTTP Server 2.2.14 and earlier, OpenSSL before 0.9.8l, GnuTLS 2.8.5 and earlier, Mozilla Network Security Services (NSS) 3.12.4 and earlier, multiple Cisco products, and other products, does not properly associate renegotiation handshakes with an existing connection, which allows man-in-the-middle attackers to insert data into HTTPS sessions, and possibly other types of sessions protected by TLS or SSL, by sending an unauthenticated request that is processed retroactively by a server in a post-renegotiation context, related to a “plaintext injection” attack, aka the “Project Mogul” issue.
I bet I am not the only one practioning this activity, because the vulnerability was described in 2009. And the fix for this vulnerability has been around for quite some years too. After the releases of the various fixes for this vulnerability the quest for KAMITMA (Knights Against Man-In-The-Middle Attacks) began. It is easy to become a KAMITMA in the context of CVE-2009-3555. You just need to point out this vulnerability to the owner of TLS/SSL protected websites that are vulnerable, politely asking that they update their software. You might mention that this is an old vulnerability (2009 is like a century in computer development).
I started this hobby because I found my own website (and ownCloud and email) is vulnerable. I asked my hosting provider and they responded by offering moving to a ‘new’ hosting environment. New is not the newest version of Apache httpd, but I am planning to move to this new hosting environment soon. If that doesn’t fix this vulnerability, I’ll have to move and suggest others to do so too because it’d show my hosting provider doesn’t care about security.
The easiest way of spotting this vulnerability is to use Firefox and make it block connections that are made using the old vulnerable protocol. Open a tab in Firefox, enter
about:config and press enter. Search for
security.ssl.require_safe_negotiation and double-click the row to set it to
true. The next time you try to visit a vulnerable website you’ll see this:
When you see this, it’s hobby time: send a message to the owner of the website asking to get updated and more secure.
I found these websites and try to keep this list updated when I find something changes:
Addition, 2014-08-28: Tweakers.net, a news website and price comparison website for electronics, hosts its images on a different domain (
tweakimg.net and subdomain
ic.tweakimg.net). Apparently, these are vulnerable because all color and layout is lost when I look at the Pricewatch. The browser console shows what is going on:
I originally posted this as a question in the Coursera MOOC on Metadata’s discussion forums on 2014-07-30. So far I have received no written responses at all, but I did get 5 points (upvotes) for posting. I hope someone outside the MOOC may be able to answer and please correct me if I’m wrong. If my concerns are justified, perhaps this may trigger a bit of a change in the course materials — I’m wary that in the coming video lecture series on Linked Data dr. Pomerantz will still confuse subject and object, but I’ll wait to see if anything has changed since last year’s session.
I’m having trouble understanding why using
name in a
<meta> element is the correct way of refining the date element in HTML, for two reasons:
- I cannot find anything explaining it in the current specifications
- I use elements in someone else’s namespace that have not been defined in that namespace (or anywhere else), don’t I?
It appears that the document Element Refinement in Dublin Core Metadata (linked next to the 2-7 lecture about Qualified Dublin Core) only describes element refinement using the RDF model. I know the semantics of
rdfs:subPropertyOf and it makes sense to me to express refinements this way.
For learning to express Dublin Core elements in HTML, the document refers to the obsolete http://dublincore.org/documents/dcq-html/, which has one reference to refining DC elements by appending
. + the refinement, namely in the section that describes how the spec is (in)compatible with previous and other versions.
Note that previous versions of this document (and other DCMI HTML-encoding documents) made some different recommendations to those found in this document, as follows:
Previous recommendations specified prefixing element refinements by the element being refined, for example ‘DC.Date.modified’ rather than ‘DCTERMS.modified’.
These forms of encoding are acceptable but are no longer considered the preferred form.
The document that replaces it, Expressing Dublin Core metadata using HTML/XHTML meta and link elements, has no references or examples of refined elements at all. It does make clear how to interpret the properties as URIs:
In a DC-HTML Prefixed Name, the prefix is the part of the name preceding the first period character; the remainder of the string following the period is treated as the "local name" and appended to the "namespace URI". If a DC-HTML Prefixed Name contains more than one period character, the prefix is the part preceding the first period, and the local name is the remainder of the name following the first period, and any subsequent period characters simply form part of the local name.
In the following example the DC-HTML Prefixed Name "XX.date.removed" corresponds to the URI
<link rel="schema.XX" href="http://example.org/terms/" > <meta name="XX.date.removed" content="2007-05-05" >
If I were to apply this to a property whose namespace is the DC Elements Set namespace, I’d get a URI that is in the DC namespace, but that is not defined in that namespace.
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" > <meta name="DC.title.serial" content="Services to Government" >
I say that the value of the property
http://purl.org/dc/elements/1.1/title.serial of a described resource is "Services to Government". This property is not defined. The same goes for the examples in the lectures and homework: by just appending some string that may or may not make sense to other humans, you can’t really expect to refine the property, can you?
So I’m puzzled. What document(s) explain this way of refining (well-)defined elements?
After continuing my search: Is it RFC 2731 from 1999 by Kunze, Encoding Dublin Core Metadata in HTML? Section 6 indeed explains the use of
<meta name = "PREFIX.ELEMENT_NAME.SUBELEMENT_NAME" ... >
for "qualifying" elements, but a few lines below, it says:
Note that the qualifier syntax and label suffixes (which follow an element name and a period) used in examples in this document merely reflect current trends in the HTML encoding of qualifiers. Use of this syntax and these suffixes is neither a standard nor a recommendation.
Uitgevers die bang zijn voor een lagere Impact Factor moeten hand in eigen boezem steken. Zeker uitgevers van wetenschappelijke tijdschriften die niet Open Access zijn, want zij helpen de afhankelijkheid van Thomson Reuters en de Journal Citation Reports® in de hand.
Thomson Reuters legt op hun website uit hoe de simpele formule achter de IF in elkaar steekt. Als je de getallen hebt, is het berekenen van de IF een invuloefening. Met de hedendaagse computerkracht is het mogelijk om op elk moment de scores van alle tijdschriften in seconden uit te rekenen. Dus is het eigenlijk onzin dat we met zijn allen zitten te wachten op De Lijst die in juni uitkomt. Maar het zijn die noodzakelijke getallen die niet zo makkelijk te verkrijgen zijn en uitgevers hebben daar zelf invloed op.
Voor citatiecijfers heb je betrouwbare referentielijsten van alle artikelen in alle tijdschriften nodig. Dat gaat al niet makkelijk; het is niet alsof je op de website van elke uitgever zomaar een lijst vindt die steeds actueel blijft. Misschien wordt de referentielijst gezien als deel van het (overgenomen) intellectueel eigendom en is men bang voor waardevermindering als de lijst openbaar is. Misschien is er geen betrouwbare lijst, omdat deze wordt gebaseerd op patroonherkenning in de tekst en redacties of uitgevers(!) juist díe citatiestijl (uit duizenden) voorschrijven die niet goed wordt herkend door de software. En dan ben je nog afhankelijk van de auteurs die de referentielijst aanleveren in het manuscript – hopelijk gebruiken de auteurs hiervoor tenminste literatuurbeheersoftware om de kans op spel- en typefouten te verkleinen.
Iedereen zou van een tijdschrift met open referentielijsten snel kunnen inzien hoevaak artikelen in het tijdschrift zelf-citaties krijgen (en dat met andere tijdschriften vergelijken, of tussen auteurs vergelijken). Ook wordt een eventuele niche snel duidelijk: de tijdschriften in een niche zullen relatief vaak naar elkaar verwijzen. Voor lezers en schrijvers ontstaat zo ook een snel overzicht van de markt in het onderwerp.
Thomson Reuters doet nog wel enige kwaliteitscontrole van een tijdschrift voordat in de lijst wordt opgenomen. Daarom kan het gebeuren dat een tijdschrift dat jaren in de JCR voorkwam opeens is verdwenen. Thomson Reuters heeft de data om die beslissing te ondersteunen, maar geeft geen antwoord op vragen over die beslissing. En dat is misschien nog wel het gekste van de hele discussie rond de Impact Factor. De wetenschap drijft op onderzoek dat repliceerbaar moet zijn, ondersteunende onderzoeksdata wordt daarom steeds vaker gedeeld en we weten dat de IF niet zaligmakend is. Maar toch vertrouwen we graag op zo’n lastig te controleren getal.