« BCLT conference on Google Book Search SettlementAmazon and Google »

Advertising Google Book Search


Advertising Google Book Search


This last trip to New York for the Tools of Change conference, I spoke with a small publisher who had been thinking off and on about the Google Book Search settlement proposal.  She made an interesting observation about the strategy for monetization of the content: it was through advertising, individual purchase, and institutional licensing.  In other words, it was not solely through advertising, which is how Google obtains the overwhelming majority of its revenue.

That made me pause, because I had never really thought about that option: I had always assumed that digital books would move toward a licensing arrangement, in part because I had presumed that individual publishers would be the ones doing the licensing.  (Fail!)

My friend, who is being squeezed betwixt Google and Amazon, was right to observe instead that Google could easily become the dominant distributor for online literature through licensing.  This is a novel burden Google is assuming, with which it has no prior experience, and one that inserts itself between rightsholders and consumers (never a pretty place).

Although I meekly admit to not having thought this through deeply, it seems to me that the reliance on licensing (and individual sales) in addition to advertising might mean one of the following things:

1) The rightsholders involved in the crafting of the settlement language in concert with Google mandated a licensing scheme because it was what they were most familiar with, and because it seemed to guarantee them a revenue floor.

Or:

2) Google has analyzed the traffic through its Google Book Search site; it is rather phenomenal, as Dan Clancy of Google revealed at the last DLF Forum, and as Jon Orwant of Google discussed in more detail at ToC.  Basically, every book in GBS gets viewed, and viewed a lot.   Even with that, maybe Google determined that GBS traffic does not generate adequate advertising revenue via click-thrus to recoup the $200 million plus Google will expend through the settlement, plus its scanning and ingest costs, and to provide adequate income on top of that for the rightsholders to make it worth anyone’s while (particularly in the near term, when people want to see returns relatively quickly).

And finally, of oourse, there is:

3) Google makes more money by using as many revenue streams as possible.  This is appealing in its simiplicity; however, the problem with this is that the organizational and technical infrastructure to support institutional sales, even through a third party (permitted in the settlement) is a significant new cost for Google.  I have heard that even individual sales are difficult to enable for Partner Program (frontlist) books, and have required an on-going re-write of the Google Checkout payments system.  In other words, I suspect Google would not take up new forms of revenue-generating distribution unless it absolutely had to do so.

I have no idea which of these postulates is correct, if elements of all three are true, or if all are wrong.  But take (2) for a second, because it is more fascinating: it means that contextual advertising against book content is really difficult because the content comes in such bigger chunks, without significant link networks to provide external relevance valuation, and with often notably less internally provided context than web pages.   In other words, the diversity of web pages and their heterogeneity in citation graphs produces more robust evaluations for ranking. It is hard to rank large items that demonstrate significant internal consistency.

In logical succession, this suggests the difficulty of integrating book content into the main index for discovery. This challenge was discussed briefly by Jon Orwant at ToC 2009 – the desire of the GBS team to “earn” inclusion into main index search results by demonstrating relevance to main search engine queries.  As he pointed out, some searches merit GBS inclusion relatively easily, such as “Tennyson poetry”; others such as “irrigation acequias” might be far more difficult for GBS to deliver results scoring high enough to justify inclusion (these are my examples; Jon used others that were better).

One of the concerns that some librarians have had with Google and GBS is the opacity of ranking; there are very pointed critiques on how Books ranking takes place within GBS.  When one attempts to essentially combine different algorithmic regimes (e.g., those for web, books, and Google Scholar) to produce integrated results, the number of potentially fatal permutations, and the risk of local (vs global) optimizations must be extremely significant.   Google might argue this is why they do not expose their calculations; certainly they must be changing them constantly.  But one might argue that this is precisely where Google should expose their work.

It would be interesting to think about a Netflix-like challenge for integrating Google Book Search content with main web results; Netflix did not expose its own algorithms in its solicitation of that work, but in that case the competition clearly exercised our public knowledge of what approaches might work best in the computational challenge of recommending against a diverse but homologous media repository.

The other sign of maturity that I would like to see emerge from Google is a willingness to expose a few knobs and levers for an Advanced Search option in GBS that would allow users control over boosts or weight factor-inclusions when there is a desire to conduct more sophisticated searching against GBS content.  Currently, GBS "Advanced Search" only permits users to specify core metadata fields such as author, title, etc.

Both of these would be new signs of openness for GBS, and would be warmly welcomed by many communities.

Feb 13, 2009 | Categories: MassBooks, Universities, BookRights, Publishers | pbrantley

6 comments

Comment from: MikeShatzkin [Visitor] Email
I "know" nothing, but my hunch about why the licensing and sale is your point number 1: publishers insisted on it.(And why would Google object? While it may be true that Google will have great difficulty recapturing its investment in book digitization through advertising, that is STILL the business they are in and are dedicated to.

Publishers, on the other hand, have already learned that the ad revenue from their books is not particularly robust and they all think of a "sell the content" model: they're used to it and there are many who think things are changing toward it (look at all the recent advice to the NY Times to start charging for web content.)

It seems likely that it will take a long time for Google to aggregate a lot of CURRENT content for sale through their channel. What BRR is really about is liberating orphan works, which it will do to everybody's benefit. There are already robust operations across publishing digitally licensing current content; those are revenue streams that will be protected for quite some time. That will keep the most licensable content away from the Google sales effort.
02/13/09 @ 06:51
Comment from: Jonathan Rochkind [Visitor] · http://bibwild.wordpress.com
So you may remember the guy whose name I forgot, but very smart and well-spoken, who represented GBS at the last DLF forum.

He actually made the point that revenue from actually selling content (rather than advertising) was a new thing for Google, and something they were doing because it seemed to be the only way to reach a settlement that made sense, not something they were doing because they actually were originally interested in it. He made the point in the context of saying it might take Google a while to work out some kinks, since it wasn't a business model they were used to.

Of course, this is just my possible faulty recollection of a possibly mis-interpreted statement. :)

But I can easily believe that Google didn't move toward this model because they actually _wanted_ to sell content, but simply because they wanted to have the content, but since the rightsholders didn't want them giving the content away for free, the only way they could provide the content was to sell it, and if they were selling it, it only made sense to take a % cut (rather than provide a distribution/sales service completely for free to rightsholders).

Makes sense to me, actually.

It still may very well result in Google having a monopoly on the sales of certain content, which is troubling.
02/13/09 @ 09:13

"Google has analyzed the traffic through its Google Book Search site; it is rather phenomenal, as Dan Clancy of Google revealed at the last DLF Forum, and as Jon Orwant of Google discussed in more detail at ToC. Basically, every book in GBS gets viewed, and viewed a lot. ..."

Did Orwant give any numbers on GBS traffic? As far as I can find, the only hint of numeric data is in Clancy's brief mention of it in NYT article that I discussed - http://blog.lib.uiowa.edu/hardinmd/2009/02/06/google-books-and-the-long-tail/
02/16/09 @ 11:00
Comment from: peter brantley [Visitor]
Jonathan, that was Dan Clancy, the senior engineer for Google Book Search.

Eric, Jon did give numbers. I suspect that he will not be formally sharing his slides; you can try to search on twitter using the tag #toc and looking for Jon's name, or Google, or GBS. However, I've also asked Dan and Jon directly if they were willing to share these stats. I'll post those here, or on a new post, whatever is most appropriate.
02/16/09 @ 12:39
Comment from: Eric Rumsey [Visitor] · http://blog.lib.uiowa.edu/hardinmd/
Here are slides from Jon Orwant's talk at TOC, with some numbers:
http://blog.lib.uiowa.edu/hardinmd/2009/02/23/jon-orwant-on-google-book-search-at-toc-slides-with-data/
02/24/09 @ 07:29
Comment from: Eric Rumsey [Visitor] · http://blog.lib.uiowa.edu/hardinmd/
Unfortunately, Orwant has asked O'Reilly/TOC to take slides down, so they aren't on their site. But we still have some of them at http://blog.lib.uiowa.edu/hardinmd/2009/02/23/jon-orwant-on-google-book-search-at-toc-slides-with-data/#comment-579
03/11/09 @ 11:52

This post has 2 feedbacks awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
7 + 3= ?
antispam test
This is the personal blog of Peter Brantley, and the opinions expressed here are his own and are not reflective of any of his employers in the continuum of history, or the University of California, which provides support for this blog.

Join EFF today

Recent Posts

Search

Subscribe

  • RSS
  • Bloglines
  • MyYahoo!
  • MyMSN
  • Newsgator
  • Google Feeds
How to subscribe
powered by b2evolution free blog software

Server manager: contact