« Librarians are better awareOn owning books »

Microsoft abandons digitization


Microsoft abandons digitization


Microsoft announced today that it was ceasing book digitization and the Live Academic and Live Books products; existing digitized material would be integrated into their main index, but no new content would be produced. Publishers involved in their Books program (in copyright) would be directed to their relationship with Ingram, with whom Microsoft had established a digitizing operation.

Some of the relevant sections of the Microsoft announcement include:

Book search winding down

Today we informed our partners that we are ending the Live Search Books and Live Search Academic projects and that both sites will be taken down next week. Books and scholarly publications will continue to be integrated into our Search results, but not through separate indexes.

This also means that we are winding down our digitization initiatives, including our library scanning and our in-copyright book programs. We recognize that this decision comes as disappointing news to our partners, the publishing and academic communities, and Live Search users. ...

Based on our experience, we foresee that the best way for a search engine to make book content available will be by crawling content repositories created by book publishers and libraries. With our investments, the technology to create these repositories is now available at lower costs for those with the commercial interest or public mandate to digitize book content. We will continue to track the evolution of the industry and evaluate future opportunities.

As we wind down Live Search Books, we are reaching out to participating publishers and libraries. We are encouraging libraries to build on the platform we developed with Kirtas, the Internet Archive, CCS, and others to create digital archives available to library users and search engines.

From my conversations with publishers, what seems most likely is that Microsoft will support the book discovery standards being developed within the Book Industry Study Group. In this scenario, Microsoft will serve as a discovery engine for in-copyright, publisher-provided materials, and then deliver the user experience through the publisher's supported repository service. This is a very different path, philosophically, than the traditional aims of Google Book Search, which seeks to guarantee a superlative user experience through more on-site control.

For libraries, the impact of Microsoft's sudden announcement is likely significant. Even under the best of contractual terms, digitizing is never wholly free: libraries have to remove books, ship books, scan/image books, and accommodate digital content workflows and records keeping. It is a significant amount of work, more or less subsidized by the commercial scanning partner (depending on the specific agreement).

Many institutions -- most institutions -- would be hard pressed to conduct self-initiated scanning operations at any significant scale; IA-assisted Open Content Alliance pricing is still fairly high (most often quoted at 10 cents/page); with Microsoft's gift of scanning facilities, this cost might decline. Labor is likely to remain the highest expense.

Another disadvantage with self-initiated scanning is that aggregation is more difficult; to search effectively, one must search either against very attractive deep-niche content, or search on a large scale. Microsoft helped obtain that scale for libraries; Google attains that scale for itself.

Microsoft's library-supported scanning not only generally came with good contract terms, permitting a relatively wide array of re-use, but it was also very appealing. Material that was delivered from Live Books to the OCA was of consistently high quality, possessing good metadata and striking visual fidelity.

There is also an obvious fallout on a possible settlement between Google, the AAP, and the AG: if Google becomes a licensor for the class of material under contention, and MSFT is no longer in the digitized book market, then Google winds up being a more privileged, and potentially sole, provider for this content. While I'd love to work at Google in that scenario, I can't help but think that it has less certain benefits for libraries and the public sector.

As some of my colleagues have noted, there have been disappointments with Live Books; very little content was ever directly available, and from Microsoft's perspective, why live with being No. 2 (or No. 3, if one accommodates the presence of Amazon's impact on publishing) in the domain of digital books -- why not focus on areas where one might stand a chance of success?

I have absolutely no disaffection with Microsoft in their decision; I probably would have come to the same conclusion, if I was in their Live group. I was pleased that they are pledging to release their partner libraries from any use restrictions for the public domain material. I've also had some preliminary conversations involving the Library of Congress, which I hope -- based on my conversations with Jay Girotto and Cliff Guren of Microsoft -- will eventually serve as an additional archive for the public domain material alongside the Internet Archive.

I've also received indication that Microsoft will be willing to share what they've learned about building a large academic journals and book database with the DLF community at our next forum in Providence. I also inquired of Microsoft whether there might be code available for re-use, but not surprisingly, they indicated that it was too tightly bound into their repository architecture.

Brewster Kahle has released a note indicating a willingness to move ahead in the community of interest to continue advancing the Open Content Alliance; certainly any coordinated efforts to advance public sector digitization should be supported, and new sustainable conceptions for advancing will be welcomed by IA, the DLF, and many others.

Speculatively, Ingram -- with the backdrop of its involvement in Microsoft-supported digitization, and its experience in serving the publishing digital supply chain -- could be an interested party. Over a year ago, as I was preparing an article on print-on- demand, I interviewed Kirby Best, head of Ingram's Lightning Source division:

Lightning Source is pursuing the addition of high quality and high value works to their digital repository; they clearly appreciate the partnerships they have been able to achieve with universities for rare and scarce materials. This kind of long tail material, if aggregated by Lightning Source, could be one critical source for a high quality repository of public domain works of great scholarly or even leisure value.

Given this level of interest, I asked Kirby [Best] if Lightning Source would be willing to fund digitization projects at university libraries which were focussed on obtaining high enough quality scans and images to power print on demand (as well as other digital delivery) solutions. He indicated that they were willing to consider such proposals, and indeed were working with one small pilot already. I inquired as to whether, in such arrangements, Lightning Source was willing to contractually commit that partner libraries were able to retain the source high quality images produced, and Kirby responded in the affirmative. (I also confirmed that Lightning was willing for this information to be public. Evidently, they had simply not been asked if they were willing to entertain such collaborations).

A little creative thinking on the part of libraries can take us far. Time to put on our thinking caps.

 

May 23, 2008 | Categories: MassBooks, DLF, Publishing, BookRights, Search | pbrantley

1 comment

Its sad to hear initiatives like this go by the wayside. Luckily I have a feeling that projects like Google's BookSearch and HP's BookPrep will continue to soldier forward for a cause that is truly noble and important.
06/23/08 @ 09:43

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
7 + 3= ?
antispam test
This is the personal blog of Peter Brantley, and the opinions expressed here are his own and are not reflective of any of his employers in the continuum of history, or the University of California, which provides support for this blog.

Join EFF today

Recent Posts

Search

Subscribe

  • RSS
  • Bloglines
  • MyYahoo!
  • MyMSN
  • Newsgator
  • Google Feeds
How to subscribe
powered by b2evolution free blog software

Server manager: contact