| « MSFT and OpenID | The Clock Moves » |
There has been a lot of controversy in the past few days about a (somewhat shapeless) article in the UK Times reporting that Google was going to provide ebooks for download; no, ur, pay-per-view books for viewing. Among the follow on articles, by far the best discussion of Google's near term plans is undoubtedly Ars.Technica's review, Buy your books by the chapter, posted a couple of days later, and clearly benefitting from more thoughtful analysis.
Almost certainly Google's near term plans involve pay-per-view online access to books provided through their publishers in the Partners Program. It is premature to hazard speculations about their priorities for longer term content delivery preferences.
However, I think one thing we can agree on is that Google's current model for content creation for publisher-provided content is getting to be (in the words of a friend) a bit long in the tooth. As with Microsoft's Live Book Search program, my understanding is that publishers presently ship books to Google, which then destructively scans and OCRs them.
That can't last; it just doesn't make sense. As a recently elected board member of the International Digital Publishing Forum, which is responsible in part for creating standards for ebooks, it is clear to me that it would be far more efficient for publishers to transmit native XML files to Google for ingest. The advantages would be enormous. Obviously, it side-steps one of the requirements for OCR, which is translating images-of-words into the texts-of-words. One might or might not need to regain positional information (i.e., where are the words on a page), depending on the markup used; I would surmise that any acceptable XML specification would at least preserve page breaks and original notations. This would be intrinsic to the file contents if the XML was a source, and not a transformation, in the publisher's production stream.
Regardless of whether or not they used IDPF-sanctioned ebook container or specification standards, a regularized XML format would permit Google to produce a wide range of products supporting online viewing; print on demand; and a variety of downloadable objects, either transformed into PDF or managed as discrete digital xml objects, with or without DRM. XML source files place Google into a far more central position in the publishing stream.
Although not many publishers can lay claim to full-bore XML production streams, an increasing number can, and some very large ones indeed. XML-based submission does not necessarily give publishers greater overarching control of the distribution of their content, and in fact it might arguably lesson it, but it does have several other advantages for them. Not the least of these, ultimately, is that standardizing on XML formats permits publishers to develop sophisticated, tiered licensing regimes for digital books to libraries and other consumers. Libraries will have to start thinking about how they are going to handle collections management of digital works for which they may only have licensed, and not permanent, access.
Where have libraries been in these issues? No where. But it is critical that we place ourselves square and center within them, because these are not inconsequential debates between publishers and the new online distributors, wholesalers, and retailers of Amazon and Google. We need to care not only about the formal specification of XML standards (of which ebooks have suffered the wondrous multiplicity of standards far more than most areas), but also the much more basic issues of digital asset management, rights, licensing, and distribution that will come with xml-formatted and distributed books.
There is certainly a lot of room for librarians to engage themselves in the IDPF and other forums to comment on standards specifications, and more importantly, to begin together to establish expectations for a world in which all forms of published content reside more or less permanently within native digital streams.
Server manager: contact