It has been fairly clear to the library community for a while now that the Google Book Search project is not going to deliver sufficient quality that 'preservation' is assured. There is now a rather detailed critique at First Monday, from Paul Duguid. His essay (noted via Peter Brantley) focuses on some editions of Sterne's bizarre novel, Tristram Shandy, included in GBS in several editions. His conclusion:
The Google Books Project is no doubt an important, in many ways invaluable, project. It is also, on the brief evidence given here, a highly problematic one. Relying on the power of its search tools, Google has ignored elemental metadata, such as volume numbers. The quality of its scanning (and so we may presume its searching) is at times completely inadequate [14]. The editions offered (by search or by sale) are, at best, regrettable. Curiously, this suggests to me that it may be Google’s technicians, and not librarians, who are the great romanticisers of the book. Google Books takes books as a storehouse of wisdom to be opened up with new tools. They fail to see what librarians know: books can be obtuse, obdurate, even obnoxious things. As a group, they don’t submit equally to a standard shelf, a standard scanner, or a standard ontology. Nor are their constraints overcome by scraping the text and developing search algorithms.
When I mentioned the article to a friend he said that it was possibly a little unfair. But I guess that is the issue that Google has to confront. If Google is going to assume the responsibility of scanning, and to speak plainly, the responsibility of
establishing, these texts, it will attract the highest standards of scholarly nitpicking. Which is often and notoriously unfair. That after all is why Professors study the early editions of Tristram Shandy. They are professional and unrelenting pickers of nits. Companies such as
ProQuest are used to collecting and aggregating materials with careful and scholarly procedures. They know that they will be pilloried if and when their scanning is unreliable or their selections are unwarranted.
I think that Dr Duguid has some good points, but there is perhaps more of a case to be made for Google than he allows. After all his paper is a very good example of how easy it is to cite and use the material that Google is assembling. He clips and shows the messy pages he has found. Scholars will like that (as those in Europe will dislike the fact that for strange copy-right related reasons the Google citations
do not work. In the US that link will give you the first/?second page of the Harvard edition. Nothing visible in Europe.).
But I also wonder about Google's methodology. Why should they ignore the way that librarians and scholars have assessed this material in the past? Not recording
volume numbers seems like a laughable error. On the day in which the
New York Times reports Google's and Microsoft's urgent drives to capture and utilise health records, we may wonder whether the medical services which Google develops can possibly be so apparently haphazard as the Book Search record appears to be.