« ptr -> (Blog{Move})Opening content, but for use »

The Orphan Monopoly


The Orphan Monopoly


Last Friday, I was able to attend a very interesting meeting at Columbia Law School on the long term ramifications of the Google Book Search settlement.  Some of what was discussed will be drawn out over future posts, here or elsewhere.   The conference was covered in twitter at #gbslaw.

There is a lot to ponder: This is arguably a massive re-writing of copyright for books without any legislative input; Marybeth Peters (MBP), the U.S. Registrar of Copyrights, observed that the settlement essentially proposes a private agreement for compulsory licensing between a large class of IP holders and world’s largest search engine.  The potential scope and policy ramifications are significant.  MBP mentioned that there might be treaty implications under international conventions.  And despite that, one of the most shocking of her statements was that the Copyright Office has not received a single inquiry from any of the 535 elected representatives of the people of the United States.  Not. One.

Orphan works

What I want to discuss in this post is a persistent theme that ranged across the panels and discussions: concern with the status of orphan works in the settlement proposal.  Only a subset of the works covered by the settlement will actually be orphan: some of the works will have identifiable rights holders, and many new rights holders will come forward.  Indeed, the settlement offers to change the rights status of a great number of works, which is by and large a useful clarification.  However, there will be a tremendous number of works for which the rights status is murky at best: they may be likely in-copyright, but with no identified rightsholder, or they might be likely out of copyright, but no one can easily verify this to be the case.

An indirect indication of the magnitude of this body of unclaimed books is foretold by Google’s set-aside of $45 million to compensate rightsholders (RH) for already digitized works.  There are differing payments for books and inserts, but let’s assume all works with newly identified rightsholders are books, which is the maximum payout ($60/title).  Dividing $45 million by $60 gives us a maximum count of 750,000 titles expected compensation.   The settlement does note that $45 million might not be enough to cover claims and more funds might be required to be added by Google, but nonetheless this must be a rough, best-guess on Google’s, the publishers, and the authors part.

There are rough estimates of around 7 million digitized volumes in GBS; subtracting 750,000 newly identified works gives us 6.25 million.  Let’s take a guess that there are maybe 1.5 million public domain works (this is not entirely out of the blue, but informed by earlier orphan works studies and reports), leaving 4.75 million titles.   That’s a lot of books – about 2/3 of the total.   It might be more, it might be less; it is a big number.

This is not inexplicable.  There are a large number of ways that books might fall into orphan status.  A quick consultation of Peter Hirtle’s copyright table at Cornell Univ. allows us to see how easy this is.   The impact of foreign rights is fiendishly complicated, and even the rules for U.S. publications are baroque; for older works it is a crafty rightsholder indeed who can figure out whether they might retain a claim.  As Peter Hirtle observed to me in an email, “The lengthening copyright terms and the gradual removal of formalities (especially the automatic renewal of works published since 1963) means that works that would have passed into the public domain in the past because the rights owners weren't concerned are still protected.  The chances that the rights holders are either unidentifiable or not locatable also goes up.”

Further, many Copyright Office records have not yet been digitized and require manual examination; a very high portion of these records are dirty, with missing metadata (including basic information such as Title or Author); obviously incorrect metadata (e.g. misspellings); transposed metadata fields; updated records with no explicit connection to superseded records; and so on.  (In other words, they are a real mess).   There have been several efforts to digitize these data, with varying success and rigor.   The most active rights identification efforts currently are those at Google and the University of Michigan.

A large number of these orphans are going to be truly public domain books, just like pre-1923 works.  However, we may never know that they actually have public domain status due to historically incomplete record keeping, and the lack of a national rights tracking and notification infrastructure.

Additionally, unlike the proposed orphan works legislation which almost, but didn’t, pass through the House and Senate last year, the rights claiming process is opt-in.  This simplifies things considerably for Google and the BRR, because – unlike the proposed legislation – the BRR is not required to undertake at any point a “diligent search” for the rightsholder/s of works on an item by item basis.  This puts the burden on the RHs to come forward to make their claims.  The settlement parties are correct to observe that the agreement engendered perhaps the single largest class notification program in the history of class action settlements in the United States, but despite its completeness, it is just not going to reach everyone who might have a stake in the suit (e.g., classic lineage problems such as the daughter of the niece of the co-author who is the last surviving heir, who doesn’t even know there were transmitted rights).

An entire group of authors that the notification will not reach are “non- active” authors of orphan works, who do not realize that they may have rights to titles digitized by Google under the proposed settlement.   Orphan works authors and rightsholders won’t opt out of the settlement, nor will they opt-in; by definition they are not aware they have a right to file claims.  This raises troubling questions about the representative completeness of the author sub-class in the settlement.

Monetization of Orphans

At Columbia Law on Friday, the most vexing issue for orphans was the distribution of income from their monetization by Google for the benefit of BRR, Google, and the Class parties (authors and publishers of books, as identified in the proposal).  The distribution of income differs considerably depending on whether it is derived from non-subscription sales (mostly, individual purchases or licensed uses), versus through institutional sales to libraries and related, approved, organizations.

In the rough, the non-subscription sale income goes first to the BRR for operational assistance, and to fund a reserve endowing support for future BRR programs.  In the consequent improbable event that there are leftover funds, they are apportioned to RHs until they have received 70 percent of the gross revenues for each book, and then (finally) leftover funds go to not for profits supporting reading, literacy, libraries, etc.   That trickle down is not likely to generate much dew on the thirsty gardens of the public sector.

This distribution is likely to generate an appreciable percentage of the total income for the BRR, a complex entity with many diverse goals, including policy, arbitration, distribution, and rights maintenance, in addition to its own internal administration.   (Even with these funds, it seems worthwhile to question whether the BRR can support itself as an independent concern without additional on-going subsidy.)

For subscription sales, which might well be ultimately the most significant source of income, the revenue is apportioned straight to the rightsholders by the BRR.  (I’ve appended the relevant settlement language at the end of this post, in its entirety).

The essential problem is that the settlement parties have a vested interest in maintaining a monopoly over access to orphan books. Marybeth Peters speculated that the resolve of settlement participants to support future orphan works legislation might be weakened, regardless of their zeal for such clarification in the past.  As the Chicago Law professor Randal Picker noted at the meeting [slides here], there is a built-in incentive for licensing associations to protect guaranteed income sources from external claimants: the settling parties want to maintain the property status of orphans as copyrighted works against outsiders.

This is wrong on the face of it; it is an abrogation of the public’s right of access that there is no structural incentive to identify public domain works within the corpus of orphans, and that the largest share of revenue generated from their digitization goes to RHs who have, by definition, no right to that income.  Randal Picker suggested that creating a more symmetric MFN status for commercial exploitation of the works covered by the settlement, such as unbundling orphan works by opening them up to exploitation by non-profits, might be a useful attenuation of this inherent danger.

There is a further problem.  In addition to the income from settlement-proposed schemes, Google uniquely will be able to generate income from not-covered uses, such as integrating the content with web, dataset, and news data to build more robust discovery services.  The advertising revenue against this aggregation will be uniquely Google’s to reap.

As Jule Sigall (formerly Copyright Office, now Microsoft) and Jane Ginsburg (Columbia Law) wryly noted at the Columbia Law meeting, it as if Google has managed to maneuver itself to the verge of a court-sanctioned release of potential liability covering the exploitation of orphan books, for the benefit of a single commercial actor.

If this is the best train coming down the tracks, it might be time to throw a red light.

 


Settlement:

6.3 (a) Unclaimed Funds
(i) Unclaimed Funds-Non-Subscription Revenue Models. Any revenues paid to the Registry and due to Rightsholders of Books under Sections 4.2 (Consumer Purchases), 4.4 (Advertising Revenue Model), 4.8(a)(ii) (Printing), and, if agreed, 4.7(a) – (c) (Print on Demand, Custom Publishing and PDF Download, respectively), but that are unclaimed by such Rightsholders within five (5) years of the last date of the reporting period in which the Books earned such revenues (“Unclaimed Funds – Non-Subscription”), will be distributed by the Registry in accordance with the Plan of Allocation as soon as practicable following the end of such five (5)-year period as follows: (1) first, to defray reasonable and necessary operational expenses of the Registry that are related to its performance, on behalf of the Rightsholders, of the functions described in Section 6.1 (Functions) and, as determined by the Board of Directors of the Registry in the exercise of its fiduciary duties, maintain reserves for such expenses on a proportional revenue basis with respect to revenue from licensees of the Registry other than Google, (2) then, any remaining Unclaimed Funds will be paid on a proportional basis to the Registered Rightsholders until all such Rightsholders of a Book have received, in the aggregate, together with all amounts paid to such Rightsholders under Section 4.5(a) (Obligation to Pay Revenue Share), seventy percent (70%) of the Gross Revenues received by Google for such Book, and (3) then, for any Unclaimed Funds remaining thereafter, to not-for-profit entities described in Section 510(c)(3) of the Internal Revenue Code chosen by the Registry after consultation with Google and, acting through the Designated Representative, the Participating Libraries and the Cooperating Libraries. The Registry shall choose not-for-profit entities described in Section 501(c)(3) of the Internal Revenue Code that directly or indirectly benefit the Rightsholders and the reading public, and will include entities that advance literacy, freedom of expression, and/or education, and, for avoidance of doubt, will not include the Authors Guild, the Association of American Publishers or other trade organizations. “Gross Revenues” means all of the revenues received by Google from the Revenue Models identified in this Section 6.3(a) (Unclaimed Funds), and only such Revenue Models.

(ii) Unclaimed Funds-Subscription Revenue Models. Any revenues paid to the Registry and due to Rightsholders of Books under Section 4.1 (Institutional Subscriptions) and, if agreed, Section 4.7(d) (Consumer Subscription Models), but that are unclaimed by such Rightsholders within (5) years of the last date of the reporting period in which the Books earned such revenues (“Unclaimed Funds-Subscription”), will be distributed by the Registry as soon as practicable in accordance with the Plan of Allocation following the end of such five (5)-year period.

Mar 15, 2009 | Categories: MassBooks, DigLibs, eBooks | pbrantley

2 comments

Comment from: Randy Picker [Visitor] · http://picker.uchicago.edu/PickerGBSTalk.ppt
The slides for my talk are at the website link listed if you are interested for more detail.
03/15/09 @ 09:23
Comment from: Hilary [Visitor]
Regarding public domain books - Google has publicly announced that they have already identified 1.5 million books out of the 7M+ books they have scanned as being in the public domain: http://booksearch.blogspot.com/2009/02/15-million-books-in-your-pocket.html James Grimmelman has pointed out that many US government office works that were scanned have still not been identified as public domain works on Google Books (http://laboratorium.net/archive/2009/01/05/why_is_google_restricting_access_to_government_boo). Marybeth Peters also noted that few of the books published before 1964 had their copyright renewed (which was required to retain copyright protection), so the majority will be in the public domain. Looking at books on Google Books published between 1923 and 1964 (ttp://books.google.com/books?q=+subject:%22Science%22&lr=&as_drrb_is=b&as_minm_is=1&as_miny_is=1923&as_maxm_is=12&as_maxy_is=1964&as_brr=0&as_pt=ALLTYPES) shows that the majority are "snippet view" or "no preview", suggesting they have not yet been identified as public domain books. I expect the number of public domain books included in Google Books to rise significantly in the future.
03/17/09 @ 12:33

This post has 2 feedbacks awaiting moderation...

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
7 + 3= ?
antispam test
This is the personal blog of Peter Brantley, and the opinions expressed here are his own and are not reflective of any of his employers in the continuum of history, or the University of California, which provides support for this blog.

Join EFF today

Recent Posts

Search

Subscribe

  • RSS
  • Bloglines
  • MyYahoo!
  • MyMSN
  • Newsgator
  • Google Feeds
How to subscribe
multi-blog engine

Server manager: contact