Tuesday, August 10, 2010

Four Fiendish Files and a Header

Alas, nothing is as easy as it seems, and when it comes to ePUB, nothing even seems easy. Even though the HTML file that you created in the last lesson is a valid Web page, the Open eBook specification requires a header for the file that tells devices how to interpret it, and it requires four separate files that need to be created in order to package the book for reading. In this lesson we'll look at those small adjustments.

The XHTML header


This header information should be the same for all the content files in your document. It defines the document as a XHTML document and describes the character set and language that will be used. The code is very simple and should just be copy and pasted in place of the <HTML> tag in your content file.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

Anytime you create a new content file for your book, paste this code at the top, then continue with the same HTML as previously discussed. That's the easy part. Now on to the four fiendish files.

File 1: mimetype


When an eReader encounters a book with the .ePUB extension, the first thing it checks for is whether it is actually an eBook. Many things could be packaged into an eBook format, but they won't read without a mimetype file. This is one short line of text in a text document with no extension. Simply open your text editor and in a new file type:
application/epub+zip

Nothing else goes in this file. Save the file as "mimetype" and after you have closed it, edit the file name in your file explorer to remove the extension. You will get a warning message that changing the file extension might make the file unreadable, but that is okay. Just delete the four letters of the extension (including the period) and save the changes.

File 2: container.xml


This is also a very short text document with the .XML file extension. Only one part of this file will change for each eBook you create: the name of the third fiendish file. Copy and paste the following into a text document.
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile media-type="application/oebps-package+xml"
full-path="oebps/BookName.opf" />
</rootfiles>
</container>

The only part of this file that ever changes is the "BookName" which is the name of the .OPF file that defines your book. The container file tells the reader where the packaged eBook files are located. Save this text file and change the extension to .XML. Copy and paste the file into every eBook you create and just change the BookName to the current project.

File 3: The .OPF package file


The .OPF file is the one that defines what your book is, where all the pieces of it are located, and any information about the book (metadata) that you would like people to know. Metadata could include the ISBN, price, category, and a host of other information. For our purposes, we are going to create a .OPF with the absolute minimum information that must be included in order for the book to be considered a valid .ePUB file. This is a text file that contains XML elements that are defined by the Open eBook specification. The elements included here are the minumum set that are required. Once again, the term BookName is a placeholder used to define the specific files in your eBook.
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="BookName001">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title>BookName</dc:title>
<dc:creator opf:role="aut">Author</dc:creator>
<dc:language>en</dc:language>
<dc:identifier id="bookid">BookName001</dc:identifier>
<metadata>
<manifest>
<item id="content" href="content.html" media-type="application/xhtml+xml"/>
<item id="toc" href="BookName.ncx" media-type="application/x-dtbncx+xml"/>
</manifest>
<spine toc="toc">
<itemref idref="content"/>
</spine>
</package>

There is a minimum of three sections in the package file: The metadata, the manifest, and the spine.
  1. In the metadata you must include at least a title, creator, language, and unique identifier. Title is pretty obvious. Creator usually starts with the role of author. Other roles may also be defined, but we will not deal with those until a much later lesson. For our purposes, we are doing these lessons in English, therefore the language code is "en". We will look at other languages in the future. Finally, every book needs a unique identifier. This is referenced in two locations: in the metadata and in the <package> element opening tag. This is supposed to be a combination of letters and numbers that uniquely identify this eBook from every other eBook that could ever be created. In some instances, the ISBN number may be used. In other cases, commercial software will generate a random code for the book. For now, we will use the BookName and three digits. You can change the numbers for each version of the file you create.
  2. In the manifest, you will list every file that is to be included in reading your eBook. At minimum, the manifest will include the content file(s) for your eBook and the fourth fiendish file which will be discussed next. If your eBook contains multiple content files, graphics, fonts, or any other content, it will all need to be listed in this section.
  3. The spine lists the files that will appear in the Contents of your eBook. If you create a file for each chapter in the book, for example, each of those files will be listed in the spine.

The .OPF is both the most complicated file in the eBook package and in many ways the most important. Save the text file and change the extension to .OPF.

File 4: The .NCX Table of Contents


The .NCX file provides the reading system with navigation points in your eBook and is required to be a conforming eBook. It has a few header items and then a listed table of contents with the names of the files (or locations within files) and the display name for each. It is a text file with the extension changed to .NCX.
<?xml version="1.0" encoding="UTF-8"?>
<ncx version="2005-1" xml:lang="en" xmlns="http://www.daisy.org/z3986/2005/ncx/">
<head>
<meta content="toc-example" name="dtb:uid"/>
</head>
<docTitle><text>Table of Contents</text></docTitle>
<navMap>
<navPoint id="1" playOrder="1">
<navLabel>
<text>BookName</text>
</navLabel>
<content src="content.html"/>
</navPoint>
</navMap>
</ncx>

This information provides navigation points for sidebar navigation in various eBook readers and for assistive technologies. It is a required file for your conforming ePUB eBook. Save the file as a text file and then change the extension to .NCX.

Those are the four fiendish files that are required in every properly formed ePUB eBook. If you are not sure you've followed everything, I've created a small ZIP file of Aesop's Fables that includes all the pieces you see in this post. You can download it at NWE Signatures eBook Samples where I'll continue to post samples from these exercises.

In the next exercise, we'll work on properly organizing and packaging the eBook.

No comments:

Post a Comment