Friday, May 30, 2008

Pages and Page Numbers.......

Many digital edition platforms ignore and eliminate traditional pagination. They create a 'reflowable' text which has a loose format which adjusts its shape to the device on which the text is displayed. Exact Editions (along with Google Book Search and most of the PDF-based digital magazine systems) is firmly page-centric. And we actually use the pages, by making the Tables and Indices live resources.

So we hit problems when publishers play fast and loose with page numbers. We met this problem today with two very different publications. The first a distinguished and intellectual magazine which has a lot of ads occuring in an unpredictable pattern within the whole magazine. So page 8 in the table of contents may really be the 18th page and page 10 the 23rd. Only the editorial matter is paginated. Such an arrangement is easy enough for a human to navigate but it gives our algorithms indigestion (hiccoughs?). The only acceptable solutions we can think of is to suggest to the publisher that they impose a traditional (ie normal) pagination, or that they supply a PDF where all the ads are collected at the back. There may be publishing objections to these solutions, so it is not certain that we can help them with a digital edition.

The second problem today was a book (I look forward to seeing it since it covers the best pubs in the UK), but awkwardly for us the index is based on a numerical ordering in which the pubs appear in the book, rather than simple pagination. As it happens, I have just bought another and weighty tome (letters from and to Wittgenstein) in which the indices are 'entry' ordered rather than page-derived. Putting all the correspondence with Wittgenstein in a date order and then using the numerical order of the letters for a scholarly apparatus rather than the pagination, makes clear editorial sense. I am pleased to say that our algorithms can probably deal with the pubs, so Wittgenstein's letters would be a comparative breeze. If Wiley/Blackwell are looking for a new digital platform we can help out.....

Loose-leaf publishing is another matter. We have wondered about it, but for the moment we shall walk by on the other side.

3 comments:

Anonymous said...

Thanks, Adam, for the examples, which come handy in the current dscussions about ‘e-books’ standards:
editorial decisions, together with authors and editors, are to be made to define what is a 'fair' design to respect the meaning of the work (and, consequently, defining when the reader should assume responsibility to read in sub-optimal conditions).
It would be detrimental to accept as a standard that any work can be diplayed in any reflowable format, without at least a warning about the loss of meaning due to reflow.
Conversely, it is detrimental to forbid reflow to (human) readers, as long as they wish it.
Consequently, I'd say that a nice reader standard should allow authors/editors to format their work for optimal rendering viz. different displays, specify these limits, while allowing diplay on 'emergency' devices/situations or purposes.

Adam Hodgkin said...

Alain: Maybe the publisher is in the position that he can 'forbid' or 'encourage' reflow, but the technology platform probably doesnt have this legal power, his system may facilitate reflowable solutions, or it may not. The GBS (Google) and the Exact Editions systems do not facilitate reflowable, downloadable file formats. I suspect that if they were to do so, it would be better to go back to the PDF/scan from which the database was built (or an XML source). That again looks like a 'publisher' decision.

I am looking at another source of improvements for readability. 'Reflow' without damage to the layout and textual arrangement can be achieved by other means. Better reading solutions and infinitely scaleable resolutions are coming to us all via the more flexible browser. This is why the iPhone is such a landmark device (the scaleability of the image, but also the 'pinch' and 'nudge' interface and the potential for 'image projection' are all forms of 'reflow'. But aside from the iPhone all the browsers (Opera was first) are going to device independent scaling. That way the designer/editor/bibliographers input to the original book can be preserved without the loss that comes from 'crude' and 'lossy' reflow formats. I say we need reflow whilst keeping the immense value of pagination!

Anonymous said...

Hello Adam,

Have you seen the Fuji Xerox Palo Alto research project CBAZ, Content Based Automatic Zooming?

Browser solutions and multitouch interfaces can work for bitmap images, in the same conditions as the actual zooms available on eBook reader devices; but they do need an access to a digital representation of the text, a typesetting routine and fonts to fit a readable composition on the display according the level of zooming.

Our eyes and brains do not tolerate the same rates of enlarging/reducing for images and for text.

The same issue has been solved by cartographers to display captions on maps, where, through a lot of preparation in the GIS, data can be dynamically sorted and displayed. Again this requires a significant amount of computing power somewhere, and at some stage, the necessary parameters for the display engine to make the right decisions.