Friday, May 30, 2008

Pages and Page Numbers.......

Many digital edition platforms ignore and eliminate traditional pagination. They create a 'reflowable' text which has a loose format which adjusts its shape to the device on which the text is displayed. Exact Editions (along with Google Book Search and most of the PDF-based digital magazine systems) is firmly page-centric. And we actually use the pages, by making the Tables and Indices live resources.

So we hit problems when publishers play fast and loose with page numbers. We met this problem today with two very different publications. The first a distinguished and intellectual magazine which has a lot of ads occuring in an unpredictable pattern within the whole magazine. So page 8 in the table of contents may really be the 18th page and page 10 the 23rd. Only the editorial matter is paginated. Such an arrangement is easy enough for a human to navigate but it gives our algorithms indigestion (hiccoughs?). The only acceptable solutions we can think of is to suggest to the publisher that they impose a traditional (ie normal) pagination, or that they supply a PDF where all the ads are collected at the back. There may be publishing objections to these solutions, so it is not certain that we can help them with a digital edition.

The second problem today was a book (I look forward to seeing it since it covers the best pubs in the UK), but awkwardly for us the index is based on a numerical ordering in which the pubs appear in the book, rather than simple pagination. As it happens, I have just bought another and weighty tome (letters from and to Wittgenstein) in which the indices are 'entry' ordered rather than page-derived. Putting all the correspondence with Wittgenstein in a date order and then using the numerical order of the letters for a scholarly apparatus rather than the pagination, makes clear editorial sense. I am pleased to say that our algorithms can probably deal with the pubs, so Wittgenstein's letters would be a comparative breeze. If Wiley/Blackwell are looking for a new digital platform we can help out.....

Loose-leaf publishing is another matter. We have wondered about it, but for the moment we shall walk by on the other side.

Thursday, May 29, 2008

More new Stuff: Petticoats and Widgets

The widget is a bit easier to explain, but we will ruffle the petticoats in a minute; as for the widget, you can kick the tires of this widget immediately by clicking on the front cover of New Humanist in the right hand column. That is a front cover image of the monthly publication. The humble New Humanist widget keeps track of the front cover of the current edition. A widget that guards a monthly periodical is going to have less to do than a widget that tracks the progress of a weekly; and when we distribute a daily publication, the front cover widget is going to be quite busy. The widget is simply a short chunk of HTML which you can copy and paste into any blog or web page, so it is also a convenient way of providing some viral promotion.

Now if you click on our New Humanist widget, what happens?

If you have done that, you will have plunged straight into the petticoat version of the magazine in a new window. This is a minimal view of the current issue of the magazine which allows you to 'search' it in full, to see search results and to navigate the thumbnail-image overview of the full magazine. Try a search for 'dawkins' to see what you get.

So the idea is that a magazine publisher can promote the current issue of the magazine with this widget, and the potential audience will get some awareness of the contents but they will only be able to go so far, and they will have a reminder that they can go further by subscribing to the magazine (either in its print or in its digital edition). The appropriate call to action can be inserted as a link in the panel that politely informs the audience that 'This preview only allows you to see thumbnails of the pages....'.

Petticoats, because we used this analogy when we were discussing the idea and considering how we could provide access to full versions of content from current issues whilst not giving the whole show away. We thought it was a bit like the can-can dancer on Montmartre, she can reveal a lot of leg confidently because the petticoats provide adequate cover (this is our thumbnails only view), or she can reveal all in a barely readable form (this would be our double-page view, and is a pretty generous offer), or at the extreme she can provide complete open access. That means taking off all restraints and is quite hard to do if you also plan to sell something to the paying customer....

Wednesday, May 28, 2008

Geo-links from Dive

Here is a page in an open free sample issue of Dive magazine which shows our post code links.

If you click on the post code, TR27 4HN for Gulfstream Scuba Ltd you get straight to this Google map of the shop's location.



A passing observation: I love the way that Google Maps now gives you some photographs of places close to the postcode or location that you have given for a map request. If you click on the photo the precise location on the map comes up with a larger version of the photo tethered to it.

Green



Welcome to a new magazine in our Australian shop. This was our quickest magazine into the shop. Only one day, the publisher could work very fast as we moved from test files to release version (I think he may have been up all night) because he had been let down by another digital system at the last minute and had already promised his audience a digital edition.

Darnton on Google and Libraries

There is an entertaining and instructive piece about libraries The Library in the New Age and their exciting future, from Robert Darnton (distinguished historian of print and librarian at Harvard) in the current issue of the New York Review of Books. A lot of his focus is on Google and Google Book Search, but the conclusions of the article are surprisingly conservative: "Meanwhile, I say: shore up the library. Stock it with printed matter." It is as though Darnton is reluctant to risk a political or philosophical view on the way the digital library should evolve: as though it were not a historian's job to make that risky judgement.

Arguably this caution comes from a proper historical modesty, but Darnton recognises the importance of the digital turn for libraries, and big decisions will come his way this year and next. He is, after all, the director of Harvard's library, one of Google's founding partners and the richest academic library in the world, so Harvard will be setting standards and should be blazing trails. Perhaps he will be bolder in a digital vein when he orchestrates policy for his institution.

Tuesday, May 27, 2008

Books and will they always be Printed on Paper?

Richard Charkin (Exacutive Director at Bloomsbury and - to declare an interest - friend of many years standing) is quoted in the Guardian on the permanence of books:

'There will continue to be a market for printed books for a very long time. I believe the bulk of people will still prefer to hold, feel, treasure, give, receive, display and read a printed book.'
Although, in some moods I am inclined to agree with Charkin: what if he is whistling in the breeze? Here are five reasons why print books may (mostly) disappear from the publishing scene (I agree that there will still be a market for second-hand printed books even when most people prefer to buy a digital book):
  1. Moore's law. If/when an acceptable and popular form of digital book arrives, the digital channel will benefit from Moore's law:- in this context this means that digital will become more attractive with respect to printed books at a rate approaching 50% per annum. Just before the Charkin comment, Gail Rebuck is cited to the effect 'e-book competitors will not kill the book but happily co-exist with it in a bright new bi-literary environment.' (Its not a direct quote). But that surely will not happen, because a digital solution will be getting better so much faster. How publishers can respond to a distribution channel that gets better (cheaper, more profitable. more capacious, better value) at 50% per annum is another matter.... but it will make it very difficult for printed books to be in a steady-state of peaceful co-existence, as it were 'always with us' like hardback and paperback editions.
  2. As more of our cultural environment migrates to the web (photos have gone with a flicker, music is going with an iTune, radio is on its last FM and TV is on the way via YouTube; film will certainly go digital), do we think that books alone of our mass culture formats will remain primarily analog in print? On the contrary books will be and are being sucked on to the web because those who live and work in a web environment, need digital books to be on the web.
  3. Energy. Books are heavy on energy. Are we sure that printed books will still be so popular when they cost £50/$80 or £15/$17.50 for mass market paperback. That may happen if oil goes to $300 a barrel.
  4. Digital editions will at some point begin to be perceived as better/more useful than print books. At that point, publishers, authors and designers will invest a great deal of effort in making them even better, in providing functions that print books cannot. And at that point Moore's law (or perhaps its Metcalfe's law) will come in with a vengeance.
  5. Libraries are going digital with enthusiasm and digital libraries will be much better than we can currently envisage. Digital literature will be the golden age of the library and we will all use digital library services.
At the moment CEO's and captains of publishing houses feel the need to be cautious and to reassure their markets and their audience that change will be gradual and not disruptive. But if the change is disruptive and imminent the publishing houses who have already geared up for digital distribution and marketing will be at an advantage. I think most CEO's in the business know that, and they also know that they are not too well prepared for it.

A lot of this is generational. I also like printed books and I am sure that I will still be reading them in 10 years time, but I suspect that by then most of my purchasing will be of digital books and my children will think it a bit odd that I still like reading from print editions.

Making a Live Post Code

The Exact Editions import process now makes post codes live, clickable, resources. We have been doing this for a week. It is not easy to predict all the doors that this might open for our publishing partners. But it is very clear that it makes advertisements more useful and more interesting. Take a simple classified ad in the Quaker weekly The Friend, which we distribute every Friday. The Penn Club has a regular ad in the magazine:




That clipping shows you the ad, but it does not show you the live links (post code, email and url), for that you need to have a subscription to our service. If you were a subscriber you would note that the post code was highlighted, and that the link takes you to an optimal view in Google Maps. If you subscribe to one of our magazines you can state on your preferences page which map system you want to use (Multimap and Street Map UK are also supported). We have a number of other resolvers in hand...... and will gradually add post code systems from other countries.

Google has for some time provided live geo-links from some of its books. But their approach is based on selecting books with a strong geo-interest (for example this travel guide to Ecuador) and then providing a constructed map view of the places mentioned in the book (probably more difficult, but less scaleable than our zip-code resuscitation method). So far as I know, neither Google, nor any other digital edition platform has yet done automated linking from post codes to a mapping system. But, of course I could be wrong about this and will print a correction if someone can produce the counter-example(s).

There is also the important difference that the Google system is really producing an annotated map from a book, whereas our system is providing navigation links from explicit items within the text. One might say, that if a post code deserves to be printed, it merits being made into a navigation link. Its simply more useful and more valuable that way.

Friday, May 23, 2008

Going Local

Yesterday's FT had a piece about mapping as an interface to the web. This is one view on why this change is important:

Erik Jorgensen, a senior executive in Microsoft’s online operations, says the software company is building a “digital representation of the globe to a high degree of accuracy” that will bring about “a change in how you think about the internet”. He adds: “We’re very much betting on a paradigm shift. We believe it will be a way that people can socialise, shop and share information.” 'Way to Go? Mapping to be the Web's next Big Thing', Financial Times, 21 May 08

Google, Nokia and others are investing in parallel projects. The article speculates that controlling the geo-interface may put one company in a dominant position. But perhaps that will not happen, in part because their is an open source foundation under construction in OpenStreetMap, Its coverage is improving in Wikipedian fashion (getting better all the time). The current view of Florence is good on the railways and autostrada, but lacking in detail.

As it happens we have started adding geo-tags to our data this week (so we can now render as live links, post codes mentioned in text or advertisements). We will blog about this shortly. As a side note: one guesses that geo-coding will become important to us all for one reason not mentioned in the FT's article yesterday. But headline news on the front page. Oil goes to $135 a barrel. It is not really a paradox to suggest that we may care more about exactly where we are, as we learn to travel less.

Wednesday, May 21, 2008

Will Digital Books and Magazines Have Skype Conversations in Them?

Sure, it is already happening. Today I followed a link from Om Malik, where he was talking about movie clips popping up within Skype conversations (apparently that is coming -- and I totally agree that Skype video calls work very well, so why not include video in the conversation?). Anyway, Om was citing the way that TV shows are now using Skype interviews, here is a link to Oprah doing it. Well that is interesting, and the Skype conversation banner is carrying an ad for Borders, and when I take a closer look at the TV show, I realise that it is actually being replayed on the People magazine web site. So here we have a Skype conversation, carrying a Borders ad, taking place on a TV show, with a recording of a live interview promoting the sale of a book on spiritual well being, distributed on a magazine's web site. And not forgetting that we accessed all that from a blog.

I lost track of the number of media channels covered by that description, but does anybody doubt that the web is leading us to multimedia engagement with our audiences?

Tuesday, May 20, 2008

When are e-Books coming?

For years I have been on the Liblicense list, which is widely read by university librarians and academic publishers. It is a big list with several thousand adherents, but librarians are not particularly vocal (that comes with needing to be quiet in the library -- yeah, I know, very feeble joke) and many publishers sign up to the list but keep their heads down (because they dont want to be exposed as money grasping scoundrels -- even more feeble joke). So the list is quite controversial but not that busy in view of the emotions it sometimes engenders.

One of the regular communicators is Chuck Hamaker (of the University of North Carolina, Charlotte). Today he commends a recent newsletter from the Association of American University Presses and in particular some promising signs of innovation from MIT Press. But he also includes an injunction to publishers to get a move on with digital books:

Come on, get to it--make e-books practical and workable, please!
I suspect that a lot of publishers feel as though they are stampeding into an e-books future, but to the university librarian it looks as though the industry is slow off the mark. In one way Chuck is clearly right. Academic research journals have been digital for years, and academic books are by contrast scarcely available in digital form. Time to hurry up!

Heisenberg's Uncertainty Principle and Google's Algorithms

We recently discovered that Google was no longer finding the home page of one of our partner publishers because the description of the magazine Quest Bulgaria on their home page was pretty much identical to the description on our system. Our derivative entry knocked them out, rather than the other way round, because our site is busier than theirs.....so given more weight by Google.

This was a puzzling and unwanted result so the publisher quickly changed the description on our system (our publishers can do this in real time through a form in which they edit the blurb), and Google is now finding Quest Bulgaria again (at the moment we come in a respectable third on the Google search). It was not difficult to make the changes and to invite the Google spider to return, but as one of my colleagues observed: Google is finding that it is not possible to be an accurate measure of the web because the way in which it maps and cadastrates the web is itself changing and deforming the natural shape of the web. My colleague finds Heisenberg's uncertainty principle at work here, but I am not so sure about that, it may simply be a lack of competition which is allowing the Google algorithms to become over bossy and over fussy. Would web spam be just as bad if there were three broadly competitive search engines at work? And if web spam were reduced would Google get subtler at discriminating between content which has a difference of function even though little linguistic difference on the page?

Monday, May 19, 2008

Where Google got the idea.....

Google's Book Search project is possibly their most ambitious undertaking. From one point of view it is an attempt to reverse engineer a proposal entertained by Alan Turing 60 years ago. He was wondering how to design a computer which would have a very large, efficient and affordable digital memory. As a thought experiment he considered the potential for using books ( a library) as a system of machine memory:

We may say that storage on tape and papyrus scrolls is somewhat inaccessible. It takes a considerable time to find a given entry. Memory in book form is a good deal better, and is certainly highly suitable when it is to be read by the human eye. We could even imagine a computing machine that was made to work with a memory based on books. It would not be very easy but would be immensely preferable to this single long tape. Let us for the sake of argument suppose that the difficulties involved in using books as memory were overcome, that is to say that mechanical devices for finding the right book and opening it at the right page, etc. etc. had been developed, imitating the use of human hands and eyes. The information contained in the books would still be rather inaccessible because of the time involved in mechanical motions. One cannot turn a page over very quickly without tearing it, and if one were to do much book transportation, and to do it fast, the energy involved would be very great. Thus if we could move one book every millisecond and each were moved ten metres and weighed 200 grams, and if the kinetic energy were wasted each time, we would consume 1010 watts, about half the country’s power consumption. If we are to have a really fast machine then we must have our information, or at any rate a part of it, in a more accessible form than can be obtained with books. (a lecture to the London Mathematical Society in Feb 1947, quoted by Hodges: Alan Turing -- the Enigma, p 319)
Turing emphasizes the crucial importance of referential transparency in designing book-based machines ("...right book and opening it at the right page, etc. etc...."). There is no point in having a digital book unless the system can locate each and every constituent element within each and every book efficiently. File-based e-book systems have papyrus-like referential opacity. Google Book Search is certainly not making this mistake. Efficient search, random access, referential precision and interoperability will work together in the digital library.

I do not seriously suggest that Google took the idea for their great project from Turing, but it is remarkable that Turing's modest proposal is being captured in ways that he could not have imagined, but of which he would surely have approved.

Thursday, May 15, 2008

To Reflow or to Cite?

The Association of American Publishers have produced a letter in support of the IDPF's EPUB standard. There are so many things wrong with this approach that it is hard to know where to start. This quotation is representative of the substance of the letter:

....For books with text that can be reflowed, many publishers would like to create and deliver to retailers and/or wholesalers EPUB files. If a proprietary e-book format is then needed, it is expected that the retailer and/or wholesaler will take on the effort to convert the EPUB file in a scalable, high fidelity way that either preserves the layout and design of the original or otherwise delivers the content in a rendering acceptable to the publisher.
First, let us grant that EPUB has a role to play as a safe and neutral file format for the various proprietary eBook standards to aim at as a conversion bridge. But it is really a very small role, and in my view PDF will be a much more important archival and preservative file format than the EPUB specification. Second, of course we are in favour of standards and different sectors of the industry collaborating to support them. But nearly everything else about the AAP's letter is off-base or highly debatable.
  • It is not clear that digital books really have to have a file format. Thinking of books as digital text files is not the way that Google Book Search works (nor is it the way we think at Exact Editions). Books are the building blocks of Google Book Search, but they are not necessarily or primarily files. The GBS view has them as collections of web pages (managed by a scaleable database that hosts many books).
  • Then there is this new 'reflow' concept. There is a growing general presumption that reflowable books are desirable. Transitive activity verbs tend to have a positive connotation. It is better that you have a book that you can {copy, lend, read, sell, flow, reflow} right? Well maybe, but a reflowable book is a book that you can not cite, that you probably cannot bookmark, that a search engine will not be able to directly search...... From many standpoints reflowable books/texts are a second best idea. Do we hear librarians, historians and curators calling for reflowable books: with tables and indexes which lose their bearings, pages that cannot be cited and typography that is messed up?
  • If you decide on a distribution channel that permits 'reflow', your book in that format will not have determinate page references and citations. That is such a big loss that every book which is 'reflowable' may need to have a referentially stable primary edition.
  • What on earth can the AAP mean by expressing the hope that the industry will have 'completed' the transition to the EPUB standard by October 2008? Completed what?
It is a mostly waffly and empty letter and will not carry weight in the tussle between Google (which should have minimal need for the EPUB format) and Amazon which is broadly on the books-are-a-file side of the fence and ought to be using EPUB for its Kindle, but is not. Whether digital books are citeable and searchable, page-fixed, digital resources; or electronic texts within a Kindle/Sony/Iliad reader will be clearer in a year or two. I doubt that it will be settled by October of this year.

Final irony. Reflowable and easily copyable texts have their purposes. One of them would be to make it easier for people to copy statements put out on the web. The AAP letter is such a circumstance, but so far from being in a reflowable or easily copied format, their letter has been put up as a simple JPEG and I had perforce to retype the passage quoted above (any errors of transcription are mine).

Wednesday, May 14, 2008

Sara Lloyd's manifesto and captcha finally gotcha

This looks pretty interesting (and it looks like the first installment of a multi-part manifesto). I particularly liked her way of putting things here:

The publishing model has evolved over history in a very slow, organic fashion. The sedate pace of change has suited publishers. Stated simply, the journey of a text from author to reader has been a linear one, with publishers traditionally fulfilling the intermediary roles of arbiter, filter, custodian, marketer and distributor. There has been some blurring at the edges, some tinkering with the process, but little radical change. In the literary world, agents have, at least partially, usurped the arbiter and filter roles. Retailers have become, to some extent, marketers and, occasionally, have even become publishers themselves. However, by and large, the stages in the process have been clearly delineated and the role of the publisher clearly defined. From a print perspective at least, publishers have offered one key, relatively unique set of abilities: to produce, store and distribute the product to the market. The rise and rise of the Internet has begun to disrupt this linear structure and to introduce the circularity of a network. More challengingly, perhaps, it has raised the distinct possibility of publisher disintermediation by more or less removing as an obstacle the one critical offering previously unique to publishers - distribution. (the complete article will appear in Library Trends)
Read the piece.

The reading/writing process is indeed, now, rather circular. One of its circularities is that when writing a blog one has to 'read' some 'machine unreadable' text in order to prove that we are not machines, but truly human. The captchas on Blogger are getting really tricky.






I am sure that the captchas are more iffy than they were a month ago. Does this mean that computers are getting better at pattern recognition and the randomised but possibly still human-readable images need to be even more convoluted? Or is it that my brain is getting fuzzier and my cerebellum more Turing-mechanical and senescent? Well you can be the judge of the matter. But if this blog suddenly stops with no explanation, and there is a prolonged silence for a matter of weeks, there is no need to send out search parties, no point in sending me belated fan-mail, you can assume that the captchas have ratcheted up to another level of difficulty and I have been completely defeated, silenced, by their 'disrupted linear structure' (to use Sara Lloyd's apposite phrase).

Friday, May 09, 2008

Incremental Improvements

Web software has the massive advantage that you can make incremental improvements to a steadily improving service (and web services do seem to steadily improve -- we trust that Exact Editions is). There was a small new release for our service today, and users will not notice anything.

The main change is that it makes it easier for us to set up 'shortcuts' for content that we are hosting for our clients. It logs the user into the right account and takes her straight to the right place in the information space. We put this switchboard into immediate effect for 3 catalogues for A&C Black, which are here:

http://www.exacteditions.com/acblack/andrewbrodie

http://www.exacteditions.com/acblack/music
http://www.exacteditions.com/acblack/children

Or see them all here

http://www.exacteditions.com/acblack

Our publishing partners will now be able to direct their customers directly to individual catalogues, if needed to a particular page in the catalogue, or to a page from whence the whole collection can be searched. It is quite difficult to do this sort of operation with PDF files (that was a British 'quite' which means 'probably very'). We needed to get our ontology straight before we built this 'switchboard'. I am told it works a bit like Clapham Junction.

Tips For Dealing With Information Overload

Philippe Lenssen at Google Blogoscoped, asked 14 talented people how they cope with the digital onrush. There are some helpful suggestions (and don't we all need it). But I was gobsmacked to see that he had sought and actually elicited advice on this matter from Noam Chomsky:

«I wish I could answer sensibly. I just can’t. You should see the room in which I’m working. Piles of books, clippings, manuscripts, notes,... All sorts of lost treasures buried in them.
That doesn't help me much. But I am massively impressed that Philippe can obtain advice on organisation from one of our intellectual giants. (Philippe --- that really was the Noam Chomsky?). Do you think Sean Connery would have any advice for me on how to keep my desk tidy? Or would Sophia Loren deign to advise me on improvements for our garden?

I think Matt Cutts's simple suggestion is the one that I will take away:
At the beginning of the day, write down the 1-2 things you really want to accomplish that day. That will help keep you on track.
The thing that most helps me to stay on track, hour by hour, is our home-grown customer relations wiki-database-blog. For some reason it is called Crumb. I am hoping that one day some Ruby ace will enhance Crumb so it can answer my emails for me.

The Future of Search and the Future of Magazines

John Battelle and Danny Sullivan have been sponsored by Thomson Reuters to write some pieces on the future of search. They are two of the shrewdest commentators on internet search so the essays will be worth reading. John Battelle has an exceptional feel for the overall commercial space in which search operates. Danny Sullivan has a terrier-like persistence which means that when he has really researched a topic, you are unlikely to find a better or a more judicious summary of it anywhere else. These guys are definitely worth reading.....

John Battelle's first piece works over some ideas that he has been poking around for some time. Searching on the go, with interaction between the web and the environment in which you move. His is an example of geo-vino-price-sensitive searching for the best deal on a bottle of Stag's Leap Cabernet as he hurries through the aisles of a supermarket pointing his phone/camera at the labels on the wines he passes (this all seems a bit furtive to me and I wonder whether John really does the shopping in his household?). My own geo preference is for a similarly priced bottle of Castello di Ama and I am not going to nickel and dime the enoteca over the last €1.50; but de gustibus non est disputandum.

John Battelle also blogs yesterday about the future of magazines (zero/niente/nil future, sooner rather than later, is my summary of his view). He is far too gloomy about that. This kind of woe/weltshmerz tends to hit magazine people who have really migrated to the web (John was a founding editor of Wired which is not to be confused with our wonderful music magazine The Wire); they tend to lose sight of the potential for magazines to be reborn digitally on the web and for subscribers to enjoy them. Some magazines have more or less given up editorially in the face of the web (has this happened to Newsweek or Time?)-- whilst others, such as the Economist and the Scientific American just keep on getting better.

In one respect the post-Battelle retail future is bright for magazines:- digital ordering and digital delivery is a breeze. Magazines and books is one of the few product categories that have well organised UPCs (universal product codes, ISSNs and ISBNs) and they can easily become digital, so the magazine publishers will be doing OK when bookstore and newstand browsers realize that they can point their iPhone at the UPC on the back of the book/mag and order a digital subscription rather than lug the pile home. 'Sale or return' is going to be a real disaster when this starts happening and the kiosk owner is going to have a struggle. Furthermore with a decent digital magazine you get access to the archive (the vintage numbers). You can't do that with a bottle of Stag's Leap: you can't track back through the vintages or order digital delivery (yet).

Come to think of it, if you could do one you could do the other. I quite fancy the idea of subscribing to a digital wine with archived and digitally 'remastered' vintages.......

Thursday, May 08, 2008

A Publishing Ontology

Yesterday I was listening to two of my colleagues discussing our platform and what should be done with a catalogue, when I realised that I did not have a clue as to what was going on. When geek-talk overwhelms me I tend to reach back for philosophical roots.
-- "Hang on a minute -- I interjected -- you are talking about our ontology. I didnt realise that we have an ontology".

Well it turns out that we do, and I am begining to get the hang of it. There are four important types of entity in the Exact Editions universe. (1) Publishers, who come at the top of the tree (of course) who control access, deliver content, they may need branding, they may get subscriptions and revenue, and they will expect to get usage statistics; then there are (2) Publications, which may be of various subtypes (eg magazines, brochures, books, catalogues etc), publications will have their own 'entry point/home page' and our usage statistics will be aggregated for the individual publication; then there are some special publications which have peculiar characteristics for example they may have earlier or later issues: (3) Issues. Searches can be aggregated across issues of a single publication. At any rate yesterday's geek-speak was hovering over the question of whether publishers' catalogues have issues or not, since they can certainly have (4) Supplements. We decided that some catalogues are issue-like. Publisher's seasonal lists have a periodical frequency which makes them a bit like issues of a periodical. At any rate, our ontology now allows for the possibility.

In fact we are still only scratching the surface. If you want to get tied in absurdly complex knots we have to introduce you to the topic of loose-leaf-publications........It will make Being and Nothingness feel like child's play.

Wednesday, May 07, 2008

Amazingly Compilcated Viewability Restrictions

One hesitates to recommend a 50 minute podcast. But this chat at Talis's The Library 2.0 Gang had some interesting comments. The focus of the discussion was on the recently release Google Book Search Viewability API, and there seemed to be fairly general agreement that it was a step in the right direction but not yet enough.

Google needs to loosen up a bit and open up some more to enable some really interesting literary mashups to take hold. There were some particularly interesting contributions from Frances Haugen, a Google Book Search Product Manager. She spoke passionately and idealistically about the aims of the Google Book Search project. She agreed that an API which allowed some server-side interactions would be a good idea. But in passing she noted that there were legal issues and limitations. I was particularly struck by her comment that the Google rules on access limitations on international viewability are 'amazingly complicated'.

Google's lawyers are being strict on the extent to which works which may not be public domain in other countries can be accessed/viewed outside the US (but the majority almost certainly are in most places). It is not surprising that such a set of house rules limits the extent to which a useful API can be defined. The problem is not so much copyright, as the differing terms of copyrights in different jurisdictions and the penumbra of uncertainty about who has what.

Google Book Search will work better for Google if they can outsource the business of establishing who has clear title in a text and where. That could mean negotiating with publishers before digitising the text. It may come to that, and Google Book Search will be more comprehensive and more accessible when it does so.

Tuesday, May 06, 2008

Google Catalogs Again

Perhaps I should have mentioned in yesterday's blog that there is a sentimental interest in Google Catalogs from the Exact Editions side. When we were planning our platform in early 2005 we decided that the minimum level of functionality for a digital magazines service, as we conceived of it, was to be as good as Google Catalogs. I am not quite sure why we picked on Google Catalogs as our benchmark, rather than Google Books (which was above the parapet as Google Print when we were prototyping), but I guess that it was partly that the Catalogs service included the double-page view which seems to be essential for magazines. And there may have been other reasons that I cannot now recall. So that is why we noted yesterday with mild tristesse that Google Catalogs seems to be dormant. Our benchmark is fading....

It is ironic that this decay for Google Catalogs should be happening just as we are finding that Book Publishers Catalog(ue)s work well in our platform. But the Exact Editions service is very different in being primarily driven for publishers, and paid for by them, (the Google Catalogs service kept the vendors at arms length and was free). We are not trying to aggregate Catalogues in one repository, but to supply a service to independent publishers web sites, the more the merrier. It is, of course, vastly too small and specific a service opportunity to be of any commercial interest to Google. There is also a very specific reason why book catalogues can be more valuable as digital editions than apparel catalogs, books are really completely exceptional in having a universal and widely used product identifier. The ISBN. If there were ISANs (International Standard Apparel Numbers) Google Catalogs would have linked to them and Google would have become a close ally of all Catalog vendors.

Google Book Search is a completely different kettle of fish. Unlike the Google Catalog system it is already beginning to connect with the publishing and selling opportunities of publishers (see the way that all (?) CUP's current output, today 35,227 titles, can now be searched with Google Book Search). GBS will indeed be an enormous success, it already has the critical mass to succeed, but it does not follow that it will inevitably lead to a Google monopoly for digital books. There will always be scope for independent technical initiatives (for some books the Google system is not a good solution) and publishers are much more likely to be squashed by Amazon's terms of trade than by Google's. Google is becoming a significant ally for the independent publisher and we doubt that it will buy Ingram/Lightning Source, Jassin's suggestion, which already has a significant collaboration with Microsoft.

Monday, May 05, 2008

Google Catalogs is in Limbo

Google Catalogs seems to be neglected. I checked it out earlier today and could not find a single Catalog with a 2008 publication date. I did find one catalogue with £ prices, and I had not realised that the Catalogs service ever included any British catalogue companies (we have catalog vendors and catalogue companies depending on where we are?).

Google Catalogs was launched in 2001, and uses rather similar technology to Google Book Search. I doubt that it was a deliberate dummy run for the books project (originally Google Print), but I am sure that useful lessons were learned. Google's books project began to see the light of day in 2004, first at Frankfurt for the world of publishing and then with various Library partnerships. Mind you The Google Book Search History page, from which I have checked these dates, could do with an update. A lot has happened since 2006. Michigan did tell us in February that they had got through 1 million titles (most but not all from the Google partnership). The Google project is coming on apace and some fresh initiatives will confirm the seriousness of their intent.

Thumbnails




Thumbnails are useful. I hope that the institutions that are taking out site licenses to our magazines will include thumbnails of their front cover in their OPACs (see the source code of this page for the relevant HTML). Sometimes a front cover is worth 10,000 words.

Maybe we should offer subscribing institutions a free widget which will keep itself up to date and carries the front cover of the current issue and then links the student straight through to the full content (within the limits of the IP range of course). Is that a good idea, or tiresome?