Google does a lot of things that don't seem that there would be any way to make money on, but they still do it and worry about the money later.
I often use google books even when I have the book, because it's easier to search through the online edition than a book's index or trying to remember where I saw something.
Completely and utterly agree with your thoughts. Transparency is an absolute must if they are going down the route of covering their costs before distributing what’s left to charity. As you say I can’t see anything being left which is why they want to hide the numbers.
Secondly, why should they prosper on the back of an anonymous author? I do think they should cover their costs but that’s it.
Pensioner
I wanted to respond to your question on the authentication mechanism planned for the institutional subscription. We have responded to this question in a number of forums include the discussion at ALA and at the DLF discussion. Our plan is to use industry standard authentication mechanisms for the institutional subscription. Some institutions use Shiboleth and we will support this although most institutions prefer IP authentication. The phrase you are extracting from the settlement agreement is refering to the very same standard procedure that you mention earlier in the post that is common in the industry. Universities and libraries have standard requirements for subscriptions that they license for their students and Google will need to accomodate these requirements similar to other products in the market to offer a successful product.
A very well writen article and a very interesting subject.
The web has indeed opened up the world of information at your fingertips. However, maybe I am just old fashioned but I still prefer reading a good book. I find it difficult to become eveloped in the words when displayed on screen and I cant see me bing curled up in front of the fire with a good laptop.
Finding or creating the tools that would allow one to feel as comfortable with your pc as one is intimate with a book will be quite a task.
Knowledge is complicated sometimes. We shouldn't avoid organizing information in complicated ways if they're called for by the content. It's not that I'm in favor of our catalogs and other info portals being arcane and mysterious. The goal is learning and understanding, not showing off our knowledge. But the point is, if we're not going to do some of that complex work, the users will have to do it. And it will make it harder for them to find what they're looking for. Also, if we don't do anything complex, it seems there will be a question about why our profession should exist at all. I think our profession is important.
Regarding public domain books - Google has publicly announced that they have already identified 1.5 million books out of the 7M+ books they have scanned as being in the public domain: http://booksearch.blogspot.com/2009/02/15-million-books-in-your-pocket.html James Grimmelman has pointed out that many US government office works that were scanned have still not been identified as public domain works on Google Books (http://laboratorium.net/archive/2009/01/05/why_is_google_restricting_access_to_government_boo). Marybeth Peters also noted that few of the books published before 1964 had their copyright renewed (which was required to retain copyright protection), so the majority will be in the public domain. Looking at books on Google Books published between 1923 and 1964 (ttp://books.google.com/books?q=+subject:%22Science%22&lr=&as_drrb_is=b&as_minm_is=1&as_miny_is=1923&as_maxm_is=12&as_maxy_is=1964&as_brr=0&as_pt=ALLTYPES) shows that the majority are "snippet view" or "no preview", suggesting they have not yet been identified as public domain books. I expect the number of public domain books included in Google Books to rise significantly in the future.
Unfortunately, Orwant has asked O'Reilly/TOC to take slides down, so they aren't on their site. But we still have some of them at http://blog.lib.uiowa.edu/hardinmd/2009/02/23/jon-orwant-on-google-book-search-at-toc-slides-with-data/#comment-579
It seems to me that Amazon with the Kindle believe the route to surviving the move to e-books is to control the platform for distribution and consumption - as Apple has done for digital music with its combination of the iPod and iTunes.
What you don't include in this picture is the question of the peer-to-peer sharing of e-books - this has been one of the key features of the development of the digital music marketplace, and we are already seeing this start in at least the academic environment (see http://dev8d.jiscinvolve.org/2009/02/10/uber-users-tom-morris-and-mike-green/ for a story of how a student got a digital copy of a paper via Twitter, plus comments on napster and bit torrent)
Thanks, everyone, for these comments. They are all somewhat similar, and I take heed of their message. I was using "API" rather loosely -- too loosely -- and should have been more precise. I did not intend API to necessarily imply a custom/baroque interface; API can be implemented simply and in an easily generalizable and well-described fashion; of course, the prevailing theme of just pushing, or exposing data, publicly in a well understood format is the best means of inducing the kind of interoperability that I am seeking. For example, the Guardian UK's new Open Platform API enables text search and data presentation through XML, JSON, and the ATOM format (specifically).
This kind of data-first principle works well when institutions feel comfortable with exposing either direct content or descriptive metadata sufficient to enable new services definitions. Of course not all content is fully open, and we will sometimes need to imbed logic relating to permissable actions within metadata, or provide an explicit guiding API with a published specification.
I agree that gesturing at things is not enough, but sometimes it is a necessary start. (As in, "Yo, over *There*!")
While I agree in principle with this notion of an OpenStack, I think that the Brooklyn Museum API is really just an example of yet-another web2.0 API.
Don't get me wrong: web2.0 APIs are great, especially for making data assets available to interested parties. But the reality is that there are similar APIs, all subtly different all over the web. How is a tool like Zotero, or Fedora, Dspace, etc going to interoperate with them all? The answer is they can't unless they deal with each on a case by case basis.
Is OpenStack an attempt to name a particular pattern of API use on the web? For example why didn't Brooklyn Museum implement OpenSearch? It would've been a whole lot easier than divining their own API. If they had users could've added a search box to their Firefox or IE browser, without having to know anything about the API.
Personally I would have preferred to see Brooklyn Museum do some simple things, like using the rel="tag" microformat on their item tags, exposing the JSON/XML records for items directly in the item displays using link elements, using oAuth for the authorization, implementing OpenSearch.
On the other hand their use of creative-commons is a good example of something that is important I think...following an existing pattern of usage on the world-wide-web.
I think it is really important that the Library/Archives/Museum community identify these patterns of use on the larger web, instead of recreating them and ending up in a niche/ghetto. I guess that's what you are saying we need? I don't think that gesturing at something and calling it OpenStack is enough though.
Great to see this posting -- I think this is exactly the direction we should be moving in.
I would make two points, though. One, I think that the library world must come to grips with the fact that its problem space and use cases are not unique in the world of distributed information systems. It follows that the best resources/approaches for distributed systems will likewise come from outside the library community.
My second point follows on the first, and that is that for systems deployed on the web using HTTP (I assume that's what we are talking about), the architectural principles articulated in RESTful design are essential. I fear that REST is considered "one approach among many options," when in fact it is simply an articulation of the forces at play when building distributed systems on the web. RESTfully designed systems will be more scalable, flexible and (most importantly here) interoperable than systems not adhering to the principles.
REST is meant to be "simple" but by no means "easy." It's a hard-won simplicity (especially since so many systems now in use are inherently not RESTful) that requires very thorough and careful planning and design. Folks in the REST worls are becoming wary of the term "API" since it suggests (and often signals) a design based on unique formats, documented URL construction, etc. Good iteroperable REST-based services will be characterized by a reliance on standard, well-documented media types (mime-types) and reliance on a well-understood link relationships. Services that go beyond that risk being less RESTful (and thus less scalable, flexible, and interoperable).
Two v. useful blog postings:
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
http://www.infoq.com/articles/subbu-allamaraju-rest
I think you can go even more radically lightweight than this. No APIs: just put your data in publicly accessible webspace. Push for some loose consensus around naming (a la your /public proposal) and formats: SQL dumps as lowest common denominator, then Atom XML, then RDF Linked Data. Bulk download and local caching: look at the success Wikipedia, Geonames, etc. have had with this model. No new APIs to learn, no arguing over how to model REST resources, just download and go... you can always add on APIs later.
Very well said. We need more librarians like you. I am a journalist in Chile and the public libraries, specially the National Public Library work more like museums. You need an entire afternoon to check just a couple of old documents.
Amazon to disable the speech to text function its new Kindle 2 ebook reader.
Amazon will probably not sell as many units since it has announced its intention to disable the speech to text function its new Kindle 2 ebook reader.
This is a bad move on their part..
The publishers and writers can disable Kindle 2's read-aloud feature.
Whats up with that?
www.net-ebooks.com
www.ebooks-downloads.com
Here are slides from Jon Orwant's talk at TOC, with some numbers:
http://blog.lib.uiowa.edu/hardinmd/2009/02/23/jon-orwant-on-google-book-search-at-toc-slides-with-data/
While OCLC's main asset may currently be it's data, this isn't going to be a long-term sustainable 'business model', and also does not serve the interests of the cooperative members of OCLC whose interests are OCLC's mission.
OCLC is going to have to move to a business model where it makes money by providing _services_, not data. Ironically, Karen Calhoun has made statements to this effect too. So some within OCLC realize the direction of the future. But getting there isn't easy. It's in OCLC's members interests that they DO get there, and don't fail, but it's also in OCLC's members interests that they get there SOON, so OCLC's current attempt to establish a monopoly on library data don't continue to be a drag on innovation.
OCLC offers services that indeed are valuable, and that to be sustainable need to be paid for. And is currently aggressively trying to expand such services, with worldcat.org, with the Worldcat APIs, with the worldcat registry, etc.
One way or another this is the future--paying for services, not paying for data whose copyright holders (if any) have no desire to keep that data from being open. (I am reminded of the analogy of those who make money selling book covers to libraries -- when the actual rightsholders of cover images have no desire to limit their use, and in fact are happy for them to be used and shared openly. This is not a sustainable business model.)
It's just a question of how long it takes to get there, and whether OCLC will still be alive when we do. It's on our interests to get there quickly, and it is also in our interests that OCLC still be there when we do. But might be a somewhat smaller nimbler OCLC than we currently have -- and no organization wants to downsize itself.
This is the personal blog of Peter Brantley, and the opinions expressed here are his own and are not reflective of any of his employers in the continuum of history, or the University of California, which provides support for this blog.