Sunday, September 16, 2007

File conversion or interoperating services?

A few years ago, the standard view was that magazines and books needed a suitable digital file format in which they could be stored (something which might become an industry standard like the MP3 was for music). The internet was the powerful new way in which publications would be shipped. To this way of thinking, digitization was all about electronic delivery. It seemed kind of obvious that this was the road ahead, and most of the companies, our competitors, that offer a digital magazine platform have this kind of technology. It usually means a proprietary and non-standard file format, often a modified form of PDF.

There is a huge amount of baggage that comes with this way of looking at things. First, it seemed that publishers would only work with this sort of system if there was a workable technology for DRM (Digital Rights Management and there were lessons here from the music industry). Second, once we see the web as a way of providing users with issues as downloadable files, we have a lot of problems about making magazine archives searchable, and over co-ordinating the usage of copies of stuff that ought to be handled through a network resource. Third, there was a big problem in deciding whether digital books or magazines should be adapted and deformed so that their format changed in a digital version from what it was in print.

All these issues are being chased to ground by a consortium which works with book publishers. The International Digital Publishing Forum with its new proposed Open eBook Publication Structure Specification.

These standards may serve a useful purpose, but it could all be a real diversion. The Google Book Search way of handling digitization may be the way forward. In which case DRM was always a bad idea. Print will always be page-based and digital print will respect the conventions of pagination. Archives should only be searchable via a network service. References and citations should always target the source. Search services matter and distribution services are beside the point. And file formats need be of no concern to users. Access management is key, just as digital rights management is otiose.

In this new world the standards that matter will be to do with making different web services inter-operable, not with harmonising the formats in which texts are held.


Alain Pierrot said...

"In this new world the standards that matter will be to do with making different web services inter-operable, not with harmonising the formats in which texts are held."

I guess I agree with you on all points (inter-operability of web services without any doubt) but I wonder whether the phrase "formats in which texts are held" couldn't lead to misunderstandings.

There are two ways of displaying digital text:

• one is reformatting it according to the display device (hopefully under control by the human reader, but...) — Mobipocket, now an Amazon subsidiary is rather good at that,

• the other one is to display it as controlled by the editor and some lay-out and typographic skill and expertise — PDF is rather good at that.

If "harmonising formats" means telling publishers (and their editors) you should opt for either way and drop the other, I'm in full agreement.

If it means telling hardware and software manufacturers, let's work so that we can have our say about the rendering of our books, while making it easier to have interoperability between competing platforms and channels of distribution, I'd advise defining such a format (i.e. build your hardware, write your software according to some minimal specs, defined according to what our experience of editing and publishing has shown as useful to ensure "readability", or "affordance").

I'm afraid most companies dream about the idea of capturing a nice share of readership through technology, proprietary formats and control of the final links in the distribution chain, without any scruple about customers nor providers.

Adam Hodgkin said...

I guess we DO agree, but I am not sure. Because I am not convinced by the first of the options that you offer. Perhaps not yet enthused of the idea that printed text ought to be able to flow gracefully into any number of formats/paginations (perhaps because that way it would be easier to display print on many different types of screen). I think that way of thinking, where print becomes fungible and fluid, tends to bring all the problems previously mentioned (how do we cite texts or share them if they are not in some canonical form?). Stronger than you, I would say that PDF has been very, very, very good at what it does best. It does it superlatively and probably is now the only format in which the traditional print 'entity' will ever be finalised for print. So long as we have printing machines we will stay with PDF, (my conjecture). There must be a good chance that PDF will be with us generations after .doc has disappeared.

It may be that publishers and publisher technology companies will compete more in the quality and in the standards they can achieve in allowing text to grow and expand through the PDF framework to become more {structured, searchable, semantic, linkable etc}. PDF equivalence, or an exact representation of the print is a minimum, but stronger publishing platforms will reach through to pull out eg Chemistry Structural information, or GIS information (Google is already doing that), or the Silverlight style of embedded depth in a zoomable context. So I agree companies will strive to create a compeitive edge in the way stuff is rendered and accessed.

Alain Pierrot said...

Given your comments, we DO agree on the principles and conclusions, and publishers should insist that e-readers providers make use of PDF.
They should support Adobe's intention to make PDF 1.7 specs a full fledged ISO standard.
Cf. press release.

In my opinion, they should also contribute further to define e-books/e-readers standards — with the Open eBook Publication Structure Specification and/or elsewhere:
• PDF should be more structured in order to allow, for instance, easier full text indexation, easier linking between table of contents and index, and page numbers; this is an urgent need if any new title is to be made available with "Amazon Search Inside"-like features.
• Obviously, some devices will require a repurposing of text and lay-out which couldn't be easily computed from the initial lay-out recorded in the print oriented PDF. It would be a good idea for publishers, graphists and hardware/software manufacturers to sort out what is editor's responsibility, what is device dependent and what is user dependent.
A discussion on O'Reilly's Radar about Context Aware Image Re-Sizing is raising interesting related questions:
the smart image re-sizing is such a nice feature to improve vignette images that it is bound to become a standard feature;
however it can modify the composition of the original photo (or artwork) in unwanted ways, against common sense or simply author's will.
Will the feature be embedded in a display device as an option (or — worse — as the only way of re-sizing)?
Will it be possible to create images with a parameter telling whether Context aware re-sizing is allowed or unwise?
Obviously, the latter would be the publishers and authors'option; it would need standardisation of image encoding, display features, ...
The same kind of issue exists with many features of text navigation tools.

Alain Pierrot said...

About repurposing the initial format, the French company Ganaxa is announcing the launch (Montreal, September 25th) of a publishing platform dedicated to e-paper, used for the French project of Les Echos.

About DRM issues, it will be worth having a look at the results of the ACAP project (Automated Content Access Protocol):
"ACAP will enable the providers of all types of content published on the World Wide Web to communicate permissions information (relating to access and use of that content) in a form that can be automatically recognized and interpreted, so that business partners can systematically comply with the publishers' policies. In the first instance, ACAP will provide a framework that will allow any publisher, large or small, to express access and use policies in a language that search engines' robot "spiders" can be taught to understand. It is anticipated that, in future, the scope of ACAP will be extended to other business relationships and other media types."
Final conference to be held in New York, November 29th.

Madge said...

Good post.