{"id":17842,"date":"2020-11-05T22:59:09","date_gmt":"2020-11-05T22:59:09","guid":{"rendered":"https:\/\/davidgerard.co.uk\/blockchain\/?p=17842"},"modified":"2022-11-26T13:01:35","modified_gmt":"2022-11-26T13:01:35","slug":"calibre-epub-and-epubcheck-the-curse-of-editing-xhtml","status":"publish","type":"post","link":"https:\/\/davidgerard.co.uk\/blockchain\/2020\/11\/05\/calibre-epub-and-epubcheck-the-curse-of-editing-xhtml\/","title":{"rendered":"Calibre, ePub and epubcheck: the curse of editing XHTML"},"content":{"rendered":"<p>Calibre is an ebook management application. It comes with a nice ebook reader too, which I use all the time. [<a href=\"https:\/\/calibre-ebook.com\/\"><i>Calibre<\/i><\/a>]<\/p>\n<p>Calibre is also the most common ePub generator. Its format converters are robust and battle-hardened.<\/p>\n<p>This post is a record of what I actually did to make the ePub for <a href=\"https:\/\/davidgerard.co.uk\/blockchain\/libra\/\"><i>Libra Shrugged<\/i><\/a>. There&#8217;s almost certainly bits I could have done better some other way, and a lot of bits where I got way too deep into technical twiddling just because I could.<\/p>\n<p>(Comments suggesting using L<sup>A<\/sup>T<sub>E<\/sub>X instead will get you slapped over the Internet.)<\/p>\n<p>If you don&#8217;t understand any of the technical detail here, don&#8217;t worry about it \u2014 there must surely be better ways than this.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/davidgerard.co.uk\/blockchain\/2020\/06\/03\/guest-post-the-last-last-word-on-bitcoins-horrifying-energy-consumption\/dancing-in-flames\/\" rel=\"attachment wp-att-16456\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-16456\" src=\"https:\/\/davidgerard.co.uk\/blockchain\/wp-content\/uploads\/2020\/06\/dancing-in-flames.gif\" alt=\"\" width=\"480\" height=\"268\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3>The horror<\/h3>\n<p>Since this is on computers, there are some gotchas \u2014 specifically, that ePub is an absolute shower of a format, and you <i>will<\/i> be editing XHTML by hand if you want to get onto Apple Books and the other minor ebook stores.<\/p>\n<p>The good news is that Calibre has a pretty good XHTML editor \u2014 right-click on book title, &#8220;Edit book&#8221;. The bad news is that you&#8217;ll need it.<\/p>\n<p>If you&#8217;ve lived your life wrong enough that you&#8217;re hand-editing ePub XHTML, you should probably install epubcheck. There&#8217;s an online version, but I just installed the Java <tt>.jar<\/tt> file locally \u2014 it&#8217;s much faster. [<a href=\"http:\/\/validator.idpf.org\/\"><i>EPub Validator<\/i><\/a><i>; <\/i><a href=\"https:\/\/github.com\/w3c\/epubcheck\/releases\"><i>Github<\/i><\/a>]<\/p>\n<p>The developer of Calibre considers epubcheck broken, which it is, and wrong about the ePub specification, which it is. <i>Unfortunately<\/i>, Apple Books requires your book to pass epubcheck anyway, with no errors or warnings. [<a href=\"https:\/\/itunespartner.apple.com\/books\/articles\/how-to-proofread-your-ebook-2715\"><em>Apple<\/em><\/a>]<\/p>\n<p>At one point I unzipped the ePub into separate XHTML files. This let me hand-tweak the files directly in vim, then add them back to the ePub using <tt>zip -f<\/tt> (freshen). Nobody who isn&#8217;t me should expect to have to do this sort of thing, but I&#8217;m a control addict.<\/p>\n<p>(I\u2019m using <tt>zip -f<\/tt> and not just making a zip file of the separate files because that way, the <tt>mimetype<\/tt> file stays both uncompressed and first in the zip file \u2014 if it isn&#8217;t, epubcheck complains. ePub is weird and annoying.)<\/p>\n<h3>Getting Calibre<\/h3>\n<p>If you have Windows or Mac, just download the latest version (5.4.2 as I write this) from Calibre and use that. [<a href=\"https:\/\/calibre-ebook.com\/download\"><i>Calibre<\/i><\/a>]<\/p>\n<p><strong>Update 2022:<\/strong> Ubuntu 22.04&#8217;s distro version of Calibre works fine. <tt>sudo apt install calibre<\/tt> and ignore the rest of this section, you lucky person.<\/p>\n<p>I use Xubuntu. Unfortunately, Ubuntu 20.04 has a broken version of Calibre that can&#8217;t possibly start or work \u2014 Ubuntu pulled a development version from Debian, nobody noticed before release time that it literally didn&#8217;t work at all, and now the broken version&#8217;s stuck in place for the next five years. [<a href=\"https:\/\/bugs.launchpad.net\/calibre\/+bug\/1877180\"><i>Launchpad<\/i><\/a><i>; <\/i><a href=\"https:\/\/lists.ubuntu.com\/archives\/ubuntu-devel\/2020-October\/041228.html\"><i>ubuntu-devel mailing list<\/i><\/a>]<\/p>\n<p>(The broken version still has a functional ebook-viewer.)<\/p>\n<p>If you&#8217;re like me and insist on using Linux, you can run the Linux install instructions <i>without<\/i> running it as root. I did the isolated install per the Linux download page: [<a href=\"https:\/\/calibre-ebook.com\/download_linux\"><i>Calibre<\/i><\/a>]<\/p>\n<pre>wget -nv -O- https:\/\/download.calibre-ebook.com\/linux-installer.sh | sh \/dev\/stdin install_dir=~\/calibre-bin isolated=y<\/pre>\n<p>(Calibre doesn&#8217;t offer distro packages, because the author has had so many bug reports from broken distro versions that he tells users to get the official binary instead.)<\/p>\n<p>After installing Calibre to my home directory in this way, I start it from a terminal.<\/p>\n<h3>Convert your DOCX in Calibre<\/h3>\n<p>I wrote both books in LibreOffice in its native ODT format. Calibre&#8217;s conversion of LibreOffice ODT files is much better in 2020 than it was for <a href=\"https:\/\/davidgerard.co.uk\/blockchain\/book\/\"><i>Attack of the 50 Foot Blockchain<\/i><\/a> in 2017.<\/p>\n<p>But I wanted clickable indexes \u2014 so I saved the book file in LibreOffice as DOCX, and sent that to Calibre for conversion.<\/p>\n<p>This is the easy part. You click the &#8220;Add book&#8221; button to import the DOCX, you right-click on the book name and go Convert books\u2192Convert individually.<\/p>\n<p>Choose the following options:<\/p>\n<ul>\n<li><b>Metadata: <\/b>Output format: EPUB. Enter Title, Author(s), Tags. Add the <a href=\"https:\/\/davidgerard.co.uk\/blockchain\/wp-content\/uploads\/2020\/10\/Libra-Shrugged-Ebook-Front-Cover-Final.jpg\">cover image<\/a>.<\/li>\n<li><b>Page setup:<\/b> Output profile: Generic e-ink HD. Input profile: Default profile.<\/li>\n<li><b>DOCX input: <\/b>Do not add a page after every endnote.<\/li>\n<li><b>EPUB output: <\/b>Flatten EPUB file structure; ePub version 3<\/li>\n<\/ul>\n<p>Then click &#8220;OK&#8221; to generate your ePub!<\/p>\n<h3>Back cover<\/h3>\n<p>I wanted the ePub to finish with the <a href=\"https:\/\/davidgerard.co.uk\/blockchain\/wp-content\/uploads\/2020\/10\/Libra-Shrugged-PB-Cover-front-back.jpg\">back cover image<\/a> from the paperback. So I added the image in the Calibre editor, which called it <tt>images\/image.jpeg<\/tt>, and added the following code near the end of the final XHTML file:<\/p>\n<pre>&lt;div&gt;&lt;svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" version=\"1.1\" width=\"100%\" height=\"100%\" viewBox=\"0 0 1640 2550\"&gt;&lt;image width=\"1640\" height=\"2550\" xlink:href=\"images\/image.jpeg\"\/&gt;&lt;\/svg&gt;&lt;\/div&gt;<\/pre>\n<p>I also needed to declare my use of SVG in <tt>content.opf<\/tt> by adding <tt>properties=\"svg\"<\/tt> for the XHTML file it was in:<\/p>\n<pre>&lt;item id=\"id11\" href=\"index_split_031.xhtml\" media-type=\"application\/xhtml+xml\" properties=\"svg\"\/&gt;<\/pre>\n<h3>Cross-references<\/h3>\n<p>In LibreOffice and Word, if you want multiple references to a single footnote or endnote, you create the footnote or endnote and then you add cross-references to it. These are clunky and inconvenient, but they work fine in the original program, and the links work in a PDF.<\/p>\n<p>If you convert DOCX to ePub in Calibre, the cross-reference becomes a link \u2014 not to the desired footnote, but to the spot in the text where the footnote or endnote of that number is.<\/p>\n<p>If you have both footnotes and endnotes, this may not even be the right footnote or endnote, because Calibre mixes your footnotes and endnotes together into a single list \u2014 then doesn&#8217;t renumber cross-references to them.<\/p>\n<p>If you convert ODT to ePub in Calibre, the cross-reference becomes just plain text with the wrong footnote or endnote number.<\/p>\n<p>The answer is to edit the XHTML. I had to go through my fifteen cross-references and cut-and-paste in the XHTML from the correct endnote in place of the erroneous cross-reference anchor.<\/p>\n<p>Calibre claims to do cross-references correctly, from both ODT and DOCX, but I can&#8217;t say it&#8217;s ever worked properly for me. [<a href=\"https:\/\/www.mobileread.com\/forums\/showthread.php?t=239066\"><i>MobileRead forum<\/i><\/a><i>; <\/i><a href=\"https:\/\/bugs.launchpad.net\/calibre\/+bug\/1429271\"><i>Launchpad<\/i><\/a>]<\/p>\n<h3>Indexes<\/h3>\n<p>Alphabetical index entries are plain text, not clickable links, in both LibreOffice and Word. This is obviously silly, which is why making index entries into clickable links is an open feature request in LibreOffice. [<a href=\"https:\/\/bugs.documentfoundation.org\/show_bug.cgi?id=71385\"><i>Document Foundation<\/i><\/a>]<\/p>\n<p>Q. What&#8217;s duller than indexing?<br \/>\nA. Indexing a second time, to get the ePub index right.<\/p>\n<p>You might correctly note that indexes are superfluous in ebooks, which have a search function \u2014 but professionally-published nonfiction ePubs tend to have indexes with page numbers and links. And having an index does look professional as hell. (And in self-publishing, you need every advantage you can get.)<\/p>\n<p>Calibre will import an index from ODT as &#8230; plain text. It shows page numbers, without hyperlinks \u2014 which is doubly useless. So don&#8217;t import from ODT if you want an index.<\/p>\n<p>Calibre will import an index from DOCX, and construct hyperlinks from it! It&#8217;ll use its own linking, not the page numbers. The links all work, but the result looks visually like an HTML conversion error.<\/p>\n<p>So if you want page numbers, but you also want working links: import your book as DOCX, and you get to edit the XHTML directly again. You&#8217;ll need a copy of the index with page numbers, &#8216;cos you&#8217;re going to need to put every single page number into your XHTML by hand.<\/p>\n<p>This will also force you to closely proofread your index, so &#8230; good?<\/p>\n<h3>XHTML filenames<\/h3>\n<p>Calibre creates ePub 3.2 books with .html filenames, but epubcheck requires .xhtml filenames. I fixed this with shell scripts applied to the unzipped files.<\/p>\n<p>(If you just cut&#8217;n&#8217;paste these lines without understanding what I did here, you may wreck your book files, and have to start over with exporting to ePub.)<\/p>\n<pre>for j in `seq -f %03g 0 31`; do for i in `seq -f %03g 0 31`; do sed -i s\/index_split_$i.html\/index_split_$i.xhtml\/g .\/index_split_$j.html ; done; done\r\nfor j in toc.ncx nav.xhtml content.opf; do for i in `seq -f %03g 0 31`; do sed -i s\/index_split_$i.html\/index_split_$i.xhtml\/g $j ; done ; done<\/pre>\n<p>Then <tt>zip -f<\/tt> to freshen the files into the ePub.<\/p>\n<h3>&lt;li&gt; in headings<\/h3>\n<p>Calibre adds an <tt>&lt;ol&gt;&lt;li&gt;&lt;\/ol&gt;<\/tt> to every heading and subheading. Every ePub reader seems to handle this fine \u2014 except FBReader, my favoured ebook reader on Android, which displays a &#8220;1.&#8221; before each header.<\/p>\n<p>The actual XHTML looks something like:<\/p>\n<pre>&lt;ol class=\"list_\"&gt; &lt;li id=\"id_RefHeading___Toc28800_897132658\" value=\"2\" class=\"block_10\"&gt;&lt;b class=\"calibre5\"&gt;Introduction: Taking over the money&lt;\/b&gt;&lt;\/li&gt;&lt;\/ol&gt;<\/pre>\n<p><b>Solution:<\/b> after you&#8217;ve unzipped the files, go through and remove every <tt>&lt;ol&gt;&lt;\/ol&gt;<\/tt>, convert the <tt>&lt;li&gt;&lt;\/li&gt;<\/tt> to <tt>&lt;p&gt;&lt;\/p&gt;<\/tt> and remove the <tt>value=<\/tt> attribute from the <tt>&lt;p&gt;<\/tt> or else epubcheck complains.<\/p>\n<p><b>Alternate or additional solution:<\/b> check <tt>stylesheet.css<\/tt> for <tt>display: list-item;<\/tt> on styles that shouldn&#8217;t have it, and replace those with <tt>display: block;<\/tt> .<\/p>\n<h3>Font troubles<\/h3>\n<p>If you wrote the book in a particular font, the index generated from a DOCX may be in whatever Calibre thinks is a good default font \u2014 and this default font may show up elsewhere. The quickest way to fix this is to edit <tt>stylesheet.css<\/tt> and remove the wrong font.<\/p>\n<h3>Remove back-arrows from footnotes<\/h3>\n<p>Calibre puts a back arrow \u2190 character on every footnote or endnote. This renders fine on <i>most<\/i> ePub readers, but fails on some old ones. I removed it entirely from the file containing the endnotes. I think the endnotes also look better without the arrows.<\/p>\n<h3>Delete calibre_bookmarks.txt<\/h3>\n<p>If you use Calibre&#8217;s ebook-viewer, it&#8217;ll add a file called <tt>META-INF\/calibre_bookmarks.txt<\/tt> to your ePub. Remove this or epubcheck will complain.<\/p>\n<h3>Kindle Previewer<\/h3>\n<p>Amazon provides Kindle Previewer for Windows or Mac. It doesn&#8217;t work in Wine, so I put it in my Windows 10 VM under VirtualBox \u2014 you can just download Windows 10 and run it unactivated. [<i><a href=\"https:\/\/kdp.amazon.com\/en_US\/help\/topic\/G202131170\">Amazon<\/a>; <a href=\"https:\/\/www.microsoft.com\/en-us\/software-download\/windows10ISO\">Microsoft Windows 10<\/a><\/i>]<\/p>\n<p>Look over every page of your ePub extremely carefully \u2014 this is precisely what Amazon will make of your ePub.<\/p>\n<p>I also checked in ebook-viewer and FBReader. You should check in whichever ePub readers you personally use.<\/p>\n<h3>Content issues<\/h3>\n<p>Draft2Digital requires that you not have the following:<\/p>\n<ul>\n<li>&#8220;Competitor Links: The content contains links to sales channels that are in direct competition with the chosen sales channels.&#8221;<\/li>\n<li>&#8220;Competitor Reference: The content contains references to sales channels that are in direct competition with the chosen sales channels.&#8221;<\/li>\n<\/ul>\n<p>This means that Apple doesn&#8217;t like links to Amazon, or even mentioning it. The only such link was in the bit at the end advertising <i>Attack<\/i>, so, fine \u2014 I removed that line.<\/p>\n<p>Smashwords didn&#8217;t want page numbers on the table of contents, so I removed those for the Smashwords upload. They didn&#8217;t fuss about links to Amazon, though.<\/p>\n<h3>Why should I bother to do all of this?<\/h3>\n<p><i>(You probably shouldn&#8217;t. I&#8217;m just like this.)<\/i><\/p>\n<p>An ePub that passes epubcheck with no errors or warnings is a thing of joy! Probably.<\/p>\n<p>A more robust file will work on more ebook readers, and your customers will be happier.<\/p>\n<p>But mostly, you&#8217;ll bother doing this if you can tap your inner reserves of extreme fussiness and perfectionism and wanting to make your beautiful literary baby as well-presented as possible. That works too.<\/p>\n<p>Also, you probably have to be a huge nerd. But at least the book will be pretty and work everywhere.<\/p>\n<br><br><div align=\"center\"><p><a href=\"https:\/\/www.patreon.com\/bePatron?u=8420236\"><img src=\"https:\/\/davidgerard.co.uk\/blockchain\/wp-content\/uploads\/2021\/10\/become_a_patron_button.svg\" alt=\"Become a Patron!\" title=\"Become a Patron!\" width=217 height=51><\/a><br><p style=\"align:center;\" class=\"patreon-badge\"><i>Your subscriptions keep this site going. <a href=\"https:\/\/www.patreon.com\/bePatron?u=8420236\">Sign up today!<\/a><\/i><\/p><\/div>","protected":false},"excerpt":{"rendered":"<p>What I actually did to make the ePub for Libra Shrugged. There&#8217;s almost certainly bits I could have done better some other way.<\/p>\n","protected":false},"author":1,"featured_media":17474,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[12,2027,2043,2011,322,51],"class_list":["post-17842","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorised","tag-attack-of-the-50-foot-blockchain","tag-calibre","tag-epubcheck","tag-libra-shrugged","tag-libreoffice","tag-self-publishing"],"jetpack_featured_media_url":"https:\/\/davidgerard.co.uk\/blockchain\/wp-content\/uploads\/2020\/06\/dancing-in-flames.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/posts\/17842","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/comments?post=17842"}],"version-history":[{"count":61,"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/posts\/17842\/revisions"}],"predecessor-version":[{"id":24241,"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/posts\/17842\/revisions\/24241"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/media\/17474"}],"wp:attachment":[{"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/media?parent=17842"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/categories?post=17842"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/davidgerard.co.uk\/blockchain\/wp-json\/wp\/v2\/tags?post=17842"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}