Categories
Technology

“Kennis over publiceren” converted to EPUB

At a panel discussion about publishing cultures in academia on the 18th of December 2012 (which unfortunately I didn’t attend), De Jonge Akademie published a little book on the topic [zotpressInText item=”2GMXVGV6″].

Although the book’s paper size is almost the same as my Sony (PRS-T2) e-reader’s screen size, the PDF version isn’t really readable on the device. The letters are too small, even when most whitespace is removed. Because I wanted to read it, and preferably on my e-reader, I converted it to EPUB myself. Here are some observations about the process.

My first attempt was fully manual: I had opened the PDF in PDF-XChange Viewer and copied the text from the document to Sigil. This introduces anomalies, as markup (headings, line and paragraph breaks etc.) and formatting (e.g. italic text, superscripts) were lost. It’s a lot of work to restore, even for this PDF of just 86 pages. I quit during processing of the second chapter.

The second attempt still took some work, but the first step was easier already. Calibre was able to convert the PDF and create an EPUB file, saving most of the markup and formatting and even the cover.
There was a lot to tweak, though:

  • soft hyphens at the end of lines are not removed in the conversion process;
  • most of the uppercase letters were stored as (and hence copied as) lowercase, including chapter titles, quotes in ‘small caps’ and “de jonge akademie”;
  • text in footers ended up in the middle of the text (although this also happened when manually copying from the source document);
  • tables were torn apart (but this may have been an option in the conversion process that I should have turned off);
  • front and back cover were apparently stored as one image with the cutting marks in the PDF, and had to be cut out by hand, stored as separate JPEGs and linked to in the EPUB;
  • in some phrases that were in italics, each word had its own set of <i></i> tags;
  • I recreated the box around one paragraph in the introduction;
  • I added as much metadata as I could find in the original to the EPUB;
  • the interviews with members of De Jonge Akademie had no markup, just formatting – I made them ‘real’ chapters by putting the title in <h1></h1>;
  • I moved one of the interviews from the middle of a chapter to the end of the chapter, to not confuse the table of contents creator.

There is probably more that can be done, but this seems enough for now. I accept suggestions for improvement of the result and the process (though I probably will not do this again soon).

The resulting EPUB-file can be downloaded. This derived work is available under the original licence (Creative Commons Attribution 3.0 NL), so you can (e.g.) improve it without asking.

Source:

[zotpressInTextBib]