Sunday, September 14, 2008

Google's Newspaper Project

Barbara Quint (always worth reading), in Information Today, has some interesting comments as she reports on the massive Google Newspaper archiving project.

Google's efforts in this space are undoubtedly impressive. But the readable quality of old newspapers is inevitably poor. Take a look at this 1944 issue of the St Petersburg Times. Google can break the newspaper up into articles and can find the headline "Russians Nearing Minsk" -- this is not easy and totally cool -- but the readability of a facsimile of a 1944 newspaper is going to be poor. I also find it a trifle intriguing that searching on the phrase "Russians Nearing Minsk" in the Google archive, or indeed on the complete Google web index finds no results at all. How can that be? Did I misread the headline?

There is a real question in my mind why Google is doing this. And I take it that their process is entirely (ok 99.99%) automated. There would be no justification for doing it if it cost them significant man hours. And I also take it that they are in principle willing to digitise every newspaper. Google does not usually bother to 'negotiate' about what content should be put into its system, anything that can be scanned and that comes from one of their partners goes in the maw. Google does not do things by halves: see the correspondingly outlandish projection from Chad Hurley, that Google's YouTube appetite will lead to exponential growth of video on the web and in the cloud.

It is highly intriguing the way in which Google's confidence that every newspaper edition is worth digitising contrasts with the widespread gloom in the mainstream newspaper business that they fundamentally have no basis for a profitable future, especially the local papers. It is very puzzling that very few newspapers have made proper efforts to sell digital editions to their current subscribers. Deeply puzzling given Google's appreciation of the value of archival databases.

No comments: