I have been re-reading the Google Book Search class action settlement proposal that would, if passed, permit Google to sell access to the Books covered under the terms of the settlement.
In additional to individual (consumer) sales, Google has the ability to sell institutional subscriptions that permit users within an institution to view the full text of all the books within the "Institutional Subscription Database" (ISD). Access to the ISD is not perpetual but available only for the duration of the subscription. Pricing for institutional subscriptions is based on the category of the institution and its size (predominantly measured by FTE count of the student body).
As for most licensed content subscriptions, access is limited (in educational institutions) to faculty, students, researchers, staff members, librarians, other personnel, business invitees, and walk-in users from the general public.
One of the things that struck me as a notable presumption about institutional subscriptions relates to Google's desire to manage inappropriate behavior. That's an understandable position -- how can they achieve that goal?
In the STM content market that university libraries are most familiar with, a licensor will monitor usage and request intervention by the licensee if they notice that (for example) the last five volumes had been downloaded in their entirety over a period of 5 days from a single IP address. In usual recourse, such an individual IP is denied access to the resource, and notice sent to the licensee's responsible party, often an individual in the library or consortial office. That individual will then rectify the situation according to a well-practiced script.
Since Google is set to establish institutional licenses along the same lines - and they've been steadily interviewing organizations who already license their content about best practices over the last few months - you might think they would utilize similar arrangements. But that's not the case.
Let's take a look at a clause in 4.1(e) Institutional Subscription Terms and Conditions.
[...] (5) include the right for Google to restrict or terminate a user’s account, including additional restrictions on printing and copy/paste, if the user distributes the copyrighted material from a Book in a manner that is prohibited by the terms and conditions or applicable law ...
Hmmm. A "user's account" in an institutional subscription. This suggests an authentication tied to specific identity. In fact, Google also needs this capacity to police the limited class-based annotation functionality provided in the settlement, which in turn suggests the need to associate students with class registration information. How would Googe implement an individual authentication and authorization model?
Do we imagine that Google will encourage the requisite and widespread deployment of Internet2's Shibboleth? Ur, no, probably not.
Actively supporting OAuth and OpenID for Google Book Search so that individuals can use their Yahoo! or Facebook logins? Hmm, probably not.
Support a diverse set of institutional IDs founded under wildly varying assumptions of affiliation and security, and often bifurcated by faculty & staff versus student status in fundamental profile characteristics such as account durability? Possible, but difficult.
Requiring every individual member of an insitutional subscription to establish Google-held accounts? Maybe. Conceivably these could be Google Accounts for Education (GA4E) or Google Accounts for Your Domain (GA4YD).
In public concerns about privacy, some imagine that our worst fears are that individuals will be tracked through cookies. Rather, on an institutional basis, including in U.S. government agency subscriptions, it might well be the case that users have to establish Google accounts tied to institional boundaries to provide the kind of auditable transaction record and compliance regime specified by the settlement. If that is the case, then not only will Google know what I've been reading, and what books I've been searching, but they could well correlate with certainty against my news subscriptions, my Google Map searches ... the list grows long.
From an institution's perspective, would I (at a major research university) be ready to deal with the support requirements explaining that every user of GBS must create their own Google account (if where there is no GA4E, separate and distinct from their existing accounts), and that each individual user is in essence a party to a license under very different terms, potentially, than those governing access to other licensed resources?
Many universities have previously turned away from Google Accounts for Higher Ed due to privacy concerns, including concerns relating to liability and identification of responsible parties in cases of private, State, or Federal legal action, including subpoena. They now face winding up entering an unexpected set of relationships through an institutional license with Google.
Now would be a good time to receive clarification about this issue.
In the very near future, my posting will move to a new site, peterbrantley.com.
Please reset your links!
Thanks much.
Last Friday, I was able to attend a very interesting meeting at Columbia Law School on the long term ramifications of the Google Book Search settlement. Some of what was discussed will be drawn out over future posts, here or elsewhere. The conference was covered in twitter at #gbslaw.
There is a lot to ponder: This is arguably a massive re-writing of copyright for books without any legislative input; Marybeth Peters (MBP), the U.S. Registrar of Copyrights, observed that the settlement essentially proposes a private agreement for compulsory licensing between a large class of IP holders and world’s largest search engine. The potential scope and policy ramifications are significant. MBP mentioned that there might be treaty implications under international conventions. And despite that, one of the most shocking of her statements was that the Copyright Office has not received a single inquiry from any of the 535 elected representatives of the people of the United States. Not. One.
Orphan works
What I want to discuss in this post is a persistent theme that ranged across the panels and discussions: concern with the status of orphan works in the settlement proposal. Only a subset of the works covered by the settlement will actually be orphan: some of the works will have identifiable rights holders, and many new rights holders will come forward. Indeed, the settlement offers to change the rights status of a great number of works, which is by and large a useful clarification. However, there will be a tremendous number of works for which the rights status is murky at best: they may be likely in-copyright, but with no identified rightsholder, or they might be likely out of copyright, but no one can easily verify this to be the case.
An indirect indication of the magnitude of this body of unclaimed books is foretold by Google’s set-aside of $45 million to compensate rightsholders (RH) for already digitized works. There are differing payments for books and inserts, but let’s assume all works with newly identified rightsholders are books, which is the maximum payout ($60/title). Dividing $45 million by $60 gives us a maximum count of 750,000 titles expected compensation. The settlement does note that $45 million might not be enough to cover claims and more funds might be required to be added by Google, but nonetheless this must be a rough, best-guess on Google’s, the publishers, and the authors part.
There are rough estimates of around 7 million digitized volumes in GBS; subtracting 750,000 newly identified works gives us 6.25 million. Let’s take a guess that there are maybe 1.5 million public domain works (this is not entirely out of the blue, but informed by earlier orphan works studies and reports), leaving 4.75 million titles. That’s a lot of books – about 2/3 of the total. It might be more, it might be less; it is a big number.
This is not inexplicable. There are a large number of ways that books might fall into orphan status. A quick consultation of Peter Hirtle’s copyright table at Cornell Univ. allows us to see how easy this is. The impact of foreign rights is fiendishly complicated, and even the rules for U.S. publications are baroque; for older works it is a crafty rightsholder indeed who can figure out whether they might retain a claim. As Peter Hirtle observed to me in an email, “The lengthening copyright terms and the gradual removal of formalities (especially the automatic renewal of works published since 1963) means that works that would have passed into the public domain in the past because the rights owners weren't concerned are still protected. The chances that the rights holders are either unidentifiable or not locatable also goes up.”
Further, many Copyright Office records have not yet been digitized and require manual examination; a very high portion of these records are dirty, with missing metadata (including basic information such as Title or Author); obviously incorrect metadata (e.g. misspellings); transposed metadata fields; updated records with no explicit connection to superseded records; and so on. (In other words, they are a real mess). There have been several efforts to digitize these data, with varying success and rigor. The most active rights identification efforts currently are those at Google and the University of Michigan.
A large number of these orphans are going to be truly public domain books, just like pre-1923 works. However, we may never know that they actually have public domain status due to historically incomplete record keeping, and the lack of a national rights tracking and notification infrastructure.
Additionally, unlike the proposed orphan works legislation which almost, but didn’t, pass through the House and Senate last year, the rights claiming process is opt-in. This simplifies things considerably for Google and the BRR, because – unlike the proposed legislation – the BRR is not required to undertake at any point a “diligent search” for the rightsholder/s of works on an item by item basis. This puts the burden on the RHs to come forward to make their claims. The settlement parties are correct to observe that the agreement engendered perhaps the single largest class notification program in the history of class action settlements in the United States, but despite its completeness, it is just not going to reach everyone who might have a stake in the suit (e.g., classic lineage problems such as the daughter of the niece of the co-author who is the last surviving heir, who doesn’t even know there were transmitted rights).
An entire group of authors that the notification will not reach are “non- active” authors of orphan works, who do not realize that they may have rights to titles digitized by Google under the proposed settlement. Orphan works authors and rightsholders won’t opt out of the settlement, nor will they opt-in; by definition they are not aware they have a right to file claims. This raises troubling questions about the representative completeness of the author sub-class in the settlement.
Monetization of Orphans
At Columbia Law on Friday, the most vexing issue for orphans was the distribution of income from their monetization by Google for the benefit of BRR, Google, and the Class parties (authors and publishers of books, as identified in the proposal). The distribution of income differs considerably depending on whether it is derived from non-subscription sales (mostly, individual purchases or licensed uses), versus through institutional sales to libraries and related, approved, organizations.
In the rough, the non-subscription sale income goes first to the BRR for operational assistance, and to fund a reserve endowing support for future BRR programs. In the consequent improbable event that there are leftover funds, they are apportioned to RHs until they have received 70 percent of the gross revenues for each book, and then (finally) leftover funds go to not for profits supporting reading, literacy, libraries, etc. That trickle down is not likely to generate much dew on the thirsty gardens of the public sector.
This distribution is likely to generate an appreciable percentage of the total income for the BRR, a complex entity with many diverse goals, including policy, arbitration, distribution, and rights maintenance, in addition to its own internal administration. (Even with these funds, it seems worthwhile to question whether the BRR can support itself as an independent concern without additional on-going subsidy.)
For subscription sales, which might well be ultimately the most significant source of income, the revenue is apportioned straight to the rightsholders by the BRR. (I’ve appended the relevant settlement language at the end of this post, in its entirety).
The essential problem is that the settlement parties have a vested interest in maintaining a monopoly over access to orphan books. Marybeth Peters speculated that the resolve of settlement participants to support future orphan works legislation might be weakened, regardless of their zeal for such clarification in the past. As the Chicago Law professor Randal Picker noted at the meeting [slides here], there is a built-in incentive for licensing associations to protect guaranteed income sources from external claimants: the settling parties want to maintain the property status of orphans as copyrighted works against outsiders.
This is wrong on the face of it; it is an abrogation of the public’s right of access that there is no structural incentive to identify public domain works within the corpus of orphans, and that the largest share of revenue generated from their digitization goes to RHs who have, by definition, no right to that income. Randal Picker suggested that creating a more symmetric MFN status for commercial exploitation of the works covered by the settlement, such as unbundling orphan works by opening them up to exploitation by non-profits, might be a useful attenuation of this inherent danger.
There is a further problem. In addition to the income from settlement-proposed schemes, Google uniquely will be able to generate income from not-covered uses, such as integrating the content with web, dataset, and news data to build more robust discovery services. The advertising revenue against this aggregation will be uniquely Google’s to reap.
As Jule Sigall (formerly Copyright Office, now Microsoft) and Jane Ginsburg (Columbia Law) wryly noted at the Columbia Law meeting, it as if Google has managed to maneuver itself to the verge of a court-sanctioned release of potential liability covering the exploitation of orphan books, for the benefit of a single commercial actor.
If this is the best train coming down the tracks, it might be time to throw a red light.
Settlement:
6.3 (a) Unclaimed Funds
(i) Unclaimed Funds-Non-Subscription Revenue Models. Any revenues paid to the Registry and due to Rightsholders of Books under Sections 4.2 (Consumer Purchases), 4.4 (Advertising Revenue Model), 4.8(a)(ii) (Printing), and, if agreed, 4.7(a) – (c) (Print on Demand, Custom Publishing and PDF Download, respectively), but that are unclaimed by such Rightsholders within five (5) years of the last date of the reporting period in which the Books earned such revenues (“Unclaimed Funds – Non-Subscription”), will be distributed by the Registry in accordance with the Plan of Allocation as soon as practicable following the end of such five (5)-year period as follows: (1) first, to defray reasonable and necessary operational expenses of the Registry that are related to its performance, on behalf of the Rightsholders, of the functions described in Section 6.1 (Functions) and, as determined by the Board of Directors of the Registry in the exercise of its fiduciary duties, maintain reserves for such expenses on a proportional revenue basis with respect to revenue from licensees of the Registry other than Google, (2) then, any remaining Unclaimed Funds will be paid on a proportional basis to the Registered Rightsholders until all such Rightsholders of a Book have received, in the aggregate, together with all amounts paid to such Rightsholders under Section 4.5(a) (Obligation to Pay Revenue Share), seventy percent (70%) of the Gross Revenues received by Google for such Book, and (3) then, for any Unclaimed Funds remaining thereafter, to not-for-profit entities described in Section 510(c)(3) of the Internal Revenue Code chosen by the Registry after consultation with Google and, acting through the Designated Representative, the Participating Libraries and the Cooperating Libraries. The Registry shall choose not-for-profit entities described in Section 501(c)(3) of the Internal Revenue Code that directly or indirectly benefit the Rightsholders and the reading public, and will include entities that advance literacy, freedom of expression, and/or education, and, for avoidance of doubt, will not include the Authors Guild, the Association of American Publishers or other trade organizations. “Gross Revenues” means all of the revenues received by Google from the Revenue Models identified in this Section 6.3(a) (Unclaimed Funds), and only such Revenue Models.
(ii) Unclaimed Funds-Subscription Revenue Models. Any revenues paid to the Registry and due to Rightsholders of Books under Section 4.1 (Institutional Subscriptions) and, if agreed, Section 4.7(d) (Consumer Subscription Models), but that are unclaimed by such Rightsholders within (5) years of the last date of the reporting period in which the Books earned such revenues (“Unclaimed Funds-Subscription”), will be distributed by the Registry as soon as practicable in accordance with the Plan of Allocation following the end of such five (5)-year period.
My eye was drawn this week to a masterful essay, in the nature of a letter to the editor, by Mathew Battles, a rare books librarian at Harvard responding to the attack of Sven Birkerts on e-readers such as Amazon’s Kindle. As Battle writes in his prelude: “... Sven Birkerts, in an article on TheAtlantic.com, suggests that [the Kindle] augurs the end of the culture of letters.”
In response, Battles writes persuasively that:
Yet the culture of letters has always been subject to disruption and transformation. Indeed, since the advent of print, technologies of the book have changed dramatically, and with them the book’s place in society. The world of letters not only transcends these technological changes—it thrives because of them. Were that not the case, the cultural continuity that Birkerts holds so dear would have been lost long ago.
After documenting the precious rarity of books in the Medieval Ages, hand-crafted gems more scarce than courtiers or cardinals, Battles discusses the impact of the Gutenberg era:
Then, as movable type began to take hold in the age of Copernicus, Erasmus, and Luther, some worried that the printing press would devalue the book. But in fact, it represented a disruption only to the channels of authority that had hitherto controlled the creation and distribution of word and image. Likewise today, the Kindle and other information technologies are less likely to destroy the authority of books than to disrupt the authority of those who control the place of books in our society.
Battles notes that Birkerts uses the experience of the poet Wallace Stevens as an exemplar of holiness of letters, but in contrast to Bikerts, Battles notes the wealth of context around Steven’s poetry and life that the internet has made emergent. Birkerts cites Bartlett’s quotations, but Battles begs to differ:
Contrast its thin fare with YouTube, where you can listen to the poet himself read "Thirteen Ways of Looking at a Blackbird"—where you'll also find an animated photograph of Stevens performing "No Ideas But In Things" in the poet's own voice; John Ashbery discussing Stevens' impact on his work; or any number of unknown readers reciting Stevens' works in front of their computers. Wikipedia, meanwhile, tells me that a portrait of Stevens' wife Elsie served as the basis of the profile of Mercury's head on the Liberty dime. I can view Stevens' house on Wikipedia, and follow links from his entry there to information about the lives and works of his contemporaries, critics, and poetic legatees. There is a context here far richer than anything the glue-and-boards Bartlett's can offer. But here's the rub: we have to make sense of this cornucopia of information ourselves. Wikipedia is not a one-stop shopping source for tidbits of misinformation; it is a living discourse, inviting dialogue and participation.
It is this prong: dialogue and participation, or as I might wish to suggest, engagement and participation, that is the catalyst for our newest age of letters, our best transformation of the means of communication we have yet sundered from the poverty of our long efforts to better ourselves.
Last week, I had the fortune of attending a Hewlett Foundation meeting on Open Educational Resources (OER); amongst the grantees, there were a wealth of other foundations represented: Wikimedia, Soros, Gates, Lumina, and Moore circulating amongst us. We had fascinating and compelling conversations; on every turn a new initiative. I was enthused by the passionate discussions that pulsed around me, and motivated by the unseen fruit that might be borne from the introductions that I and others will make with those not in attendance, conceived in the hope of exposing riches to a greater public, via previously unimagined fulcrums. On my part, the thrill of introducing Wikimedia Foundation to the Smithsonian Institution: priceless. Go, for God’s Sakes, Leap at chance!
Yet there was a disquiet in the hallways. A sense of a slow, growing staleness among the projects. Those who had attended for two or three years running noted the continuing centrality of the availability of "content" in the roundtable discussions: how do we make more of it available to a larger number? How shall we describe it? What content might best be chosen? These seem vital, and such questions are, but there was a growing sense that it was not all of what makes our futures worthwhile.
It is as Battles noted: engagement and participation, and these were missing from our dancing. There was a single session (led by Phoenix Wang) on gaming, mobile, and alternative paths through the navigation of knowledge: it was an outlier. Road Trip Nation was inspiring, and there were a few others as well. But few ... few.
It is as if the effort to assist education through the production of OER content has been waylaid by too steadfast an effort. It is in the context of the culture which the net provides that we craft new advantage in our attempts to re-understand and re-write the world: not merely finding information, but finding the context of it, and working with it. It is hearing Stevens recite the poetry, or listening to someone discuss the poet’s place in American thought: that is the engagement. And the participation is working with the poetry, applying it to make something new, either creative, or a new interpretation in our understanding of history, or the struggles of the people that the poetry represents. That is the participation.
Thoreau wrote in Walden,
I left the woods for as good a reason as I went there. Perhaps it seemed to me that I had several more lives to live and could not spare any more time for that one. It is remarkable how easily and insensibly we fall into a particular route, and make a beaten track for ourselves. I had not lived there a week before my feet wore a path from my door to the pond side; and though it is five or six years since I trod it, it is still quite distinct. It is true I fear that others may have fallen into it and so helped to keep it open. The surface of the earth is soft and impressible by the feet of men; and so with the paths which the mind travels.
So our own organizations and initiatives. We should be riding a different road now: certainly as our mission, more content to assist in curricula is a fine thing, let us not wait to get more of it exposed. But the wealth of our society lies in working with it, and we must press ourselves to construct new tools that facilitate that.
The new presses of the next age; the new libraries and the new museums; they are not measured solely by their resources but by their understanding of the worth of engagement and participation, and their ability to weave their content, as the Guardian so lovingly put in, “into the fabric of the web.”
Server manager: contact