Congressional Copyright


A few weeks ago, I was working with Carl Malamud to encourage the University of Michigan to rehost Congressional Hearings material that will be digitized at the Boston Public Library by the Internet Archive, a role which Michigan happily embraced. At that time, I corresponded with good friends at Readex about whether rights claims had ever, to their knowledge, been placed against Hearings or Congressional Records material. Readex has a long history of developing very high quality full-text resources of government documents with powerful search support. I figured their staff would have more informed insight than most principals.

I was thrilled to get a private email from August Imholz, senior staff at Readex (for identification purposes only), summarizing his understanding of this concern. I reproduce it verbatim below.


 

Disclaimer:

The following discussion is solely the responsibility and opinion of August A. Imholtz, Jr. and does not in any way represent the views, opinion, or official position of his employer, Readex, a division of NewsBank, inc. Mr. Imholtz, although he has had thirty-five years experience in working with U.S. Congressional publications, is not a copyright attorney. He has corresponded with a number of people in Washington, DC about this matter but the opinions expressed below are entirely his own. Should he receive an opinion from the Government Printing Office on these matters, he will share it with Mr. Brantley.

I. Congressional Record: Texts, be they copyrighted or not, may be inserted into the Congressional Record [and this is true of its predecessors, though the further back in the 19th century the fewer instances of such insertions occur] under the protection accorded to spoken remarks on the floor of the Senate or the House. It would be worthwhile to locate specific provisions and interpretation of this privilege in Deschler’s Precedents and Hinds and Cannon on the House side and in Riddick on the Senate side, but I have not done that. Now it is permissible to copy the Record as a whole but not to excerpt from the Record, say Senator XXX's insertion of Brantley’s copyrighted "Horatian Ode on the Sather Gate." For example, LexisNexis and Hein are both producing commercial versions of the Congressional Record, which they legally may do, but one cannot simply excerpt copyrighted verse, newspaper articles, etc. inserted into the Record and republish them out of the context of the Record itself. In other words, an item’s copyright does not lapse even if it receives the great honor of being inserted into the Congressional Record. [By the way, and it is really not all that relevant or interesting to perhaps anyone but myself, in odd moments at the Senate Library I have taken to pulling volumes of the Record off the shelf and looking for verse, most of which seems to have been, at least in the last quarter of the 19th century printed in the remarks themselves rather than in the inserted or additional material.]

II. Congressional Hearings: Copyright material submitted in congressional hearings, which usually will not be read verbatim, if at all, during the course of the hearing, likewise does not lose its copyright status; but the hearing can as a whole be reproduced.

Neither I, nor any of the several people whom I have spoken, can remember any instance in which suit was brought by a copyright holder for replication of his or her material in a hardcopy or microform copy of a Congressional hearing. This is of course not to say that such an event never occurred. A number of firms and institutions have republished and continue to publish Congressional hearings in various media. I was the managing editor for the CIS Congressional Committee Hearings collection, 1833-1969, and do not recall any "reprint" permission ever having been requested. My recollection is that of course copyrighted exhibits were published in the printed versions of Congressional hearings but it occurs in far, far from a majority of the hearings. If I were to guess, I would say for most of the 20th century maybe in 5 percent of the hearings or perhaps a little more. True, in the past several decades that percentage may well be higher for a number of reason. It would be interesting to do a search on the word copyright [not sure most search engines can search the symbol ©] in the full set of Congressional hearings once they are fully digitized.

Important works on copyright are: H.Rpt. 94-1976 Copyright Law Revision, which covers of course the "fair use" issue, and U.S. Code 5678-5679.

January 26, 2008  | Categories: BookRights

DLF Spring 2008: projected tracks


As we start to coast through January into February, DLF Central is beginning to glance with increasing focus on the upcoming Spring Forum. Spr 08 will be held in Minneapolis, MN, from Monday Apr 28 through Wed Apr 30, at the Hyatt Regency.

In the coming weeks, we'll provide a formal call for papers. however, it's not too early to begin thinking of topics and things that you would like to see appear, whether progress reports or information on new initiatives.

The Forum will remain an open submission. However, we will will be attempting to create specific groups of papers in the following areas:

1. User experience. this should be construed more broadly than UI issues, including also user navigation, interaction, and site or app functionality for user manipulation. non web apps (e.g., flash, air, silverlight) would be eagerly considered. innovative user navigation tools across large data compilations, e.g., seadragon, are of particular interest. (n.b.: faceted browsing, unless combined with newer functionality, is of less interest, as are other now routine efforts in search result navigation). experiments with new online social information systems such as Twitter would also fall into this category.

2. Data management. this would include cyberinfrastructure or e-science applications which involve assistive data management or computation; real time data curation or metadata generation from sensed inputs (e.g. astronomical, ecological, biological, etc); and data modeling. experiments or speculations on the utility of semantic based applications, such as the SIMILE family at MIT, or use of commercial applications such as Freebase or Twine, are of very high interest.

3. Large scale architectures. this includes topics relating to the development of massive data and compute stores, particularly those involving "cloud computing" and external, virtualized services such as S3, sun grid services, microsoft virtual servers, and so forth. implementations of mapreduce clones, hadoop, and similar tools are of interest. experimentation with online network based and distributed applications (ie software-as-a- service models) would be well received. topic also includes preservation architectures, particularly those engineered to support near-unbounded scaling.

4. GIS. broadly defined. interest in mapping based overlays of historical tabulated or non-textual data; integration of textual and non-textual resources with geographic or mapping infrastructures; utilization of GIS for computationally based research; generation of virtual earth based user or avatar navigation against rich or deep resource collections (might also be included in user experience category); and so forth.

5. User generated metadata. very interested in applications exploring the mining and re-use of user generated content, particularly metadata (broadly defined), such as that invited by the Library of Congress in their Flickr Commons experiment; UGC supporting LibraryThing, OpenLibrary, and other wide-scale bibliographic systems; re-use (not just collection of) user-based citation application data; applications that take advantage of generated user data, such as usage data, "social graphs", collaborative filtering, etc., for recommending or other enhancement of information discovery and management.

And as I said earlier, DLF anticipates that there may be many unexpected or new systems or reports that our community will have interest in hearing about.

January 19, 2008  | Categories: DLF, DigLibs, Libraries

DLF Fall 2007 Forum Survey


After the DLF Fall 2007 Forum, we invited responses to a short survey.

Deleting individual comments for space, here are the results of the survey:

 DLF Survey Results

January 3, 2008  | Categories: Bookstores

Book search will not work like web search


Tim O'Reilly has blogged a couple of times about Google transitioning to a strategy of "trading for its own account" - in other words, moving to position where they acquire or actively license content, which is then incorporated and presented alongside content harvested on the external web. This leads to a number of potential conflicts in the ranking and presentation of content. In short, there is no guarantee that Google will not prejudice its own content over others, and in fact, every reason that it might, despite assurances they may provide to the contrary.

Tim, in his latest entry, cites a recent blog post by Anil Dash, Google and Theory of Mind, which discusses Google's effort to pull attention from Wikipedia toward its new UGC-based product, Knol. Anil expresses his concerns:

... Knol shares with Google Book Search the problem of being both indexed by Google and hosted by Google. This presents inherent conflicts in the ranking of content, as well as disincentives for content creators to control the environment in which their content is published. This necessarily disadvantages competing search engines, but more importantly eliminates the ability for content creators to innovate in the area of content presentation or enhancement. Anything that is written in Knol cannot be presented any better than the best thing in Knol.


The presentation of other people's content (belonging to publishers, authors, and the public) within Google Book Search [GBS] becomes an experience owned by Google, and engineered in a manner opaque to others, out of others' control, with complex attendant issues in how Google intermingles GBS results with externally harvested material. Most importantly, from the perspective of publishers and authors, there is very little they can do to control the discovery and presentation experience. Even ascertaining whether the results are flat across providers is beyond the realm of assured knowledge for any one content owner; Random vs. Harper vs. public domain -- no one can have definite knowledge of what is ranked, and in what manner, thus incapacitating innovation.

This has all been very much in my mind, as the director of DLF. When I was in Washington D.C. last December for the CNI Task Force meeting, I spoke with a couple of IP lawyers who are in the business of supporting non-profits and libraries. Both had interesting things to say relating to Google and its purported negotiations with publishers. (Neither have any engagement in the AAP/AG litigation with Google.) One of the most troubling comments related to an aspect of the possible settlement terms that I hadn't really thought through before.

As the New Yorker's Google's Moon Shot article relates, the most likely settlement agreement would involve a voluntary collective license in which revenue from viewing or sharing texts is shared among publishers, authors, and Google. "Well, I love deals like that," one lawyer friend opined sarcastically. "Everyone gets something for nothing, except maybe the libraries. People get money for things they don't deserve." I didn't understand, and he explained, "For orphan works, where there are undetermined rights holders, there would undoubtedly be revenue sharing between Google and the publishers, and the authors, if they are all treated collectively. So these parties will still be making money that does not really belong to them, except by virtue of any agreement that they might come to terms on." In other words, who ever owns the rights -- because no one knows for sure in the case of orphans, whether it be publisher or author or public -- the money will almost inevitably flow to commercial parties that have no claim to it; in the case where the work is rightfully public domain, but has not been exposed as such, then publishers and authors would make money at the public's expense.

Obviously, libraries would have to be the source of many of these out of print books in such an arrangement; otherwise the publishers would have possession of them, and likely provide them through the Google publisher's partner program. I wondered: would the libraries be able to direct any income from the utilization of orphan works that such an agreement might secure? Could libraries be counted upon to speak not just for themselves, but rather for the overall community, to secure the general good? No answers are forthcoming.

These are mere hypothetical musings; nothing save wild thoughts. Except that recent events keep bringing me back, like the incessant tug of a moontide, to Tim O'Reilly's writings on Google "trading on their own account."

At the very end of 2007, I was happy to announce that the University of Michigan, the first Google Book Search library partner, was willing to host U.S. congressional documents being digitized by the Internet Archive, Public.Resource.Org, and the Boston Public Library; this might eventually be the largest single mass of publicly available U.S. congressional documents on the web, and it is funded in part by a $250,000 grant from the Omidyar Network.

It struck me as interesting that no large research library, public or private, was willing or able to be the initial protagonist; nonetheless, I am heartened that the University of Michigan will act with its characteristic boldness to provide a second home and access point for the content; fwiw, Michigan has also released congressional hearings digitized through their partnership with Google, whereas Google has not.

Research libraries have been agonizingly conservative over these issues, afraid of rights entanglements, particularly. Congressional hearings sometimes include submitted material of varying rights status, but once published, all government documents are public domain by definition. Nonetheless, the possible presence of non-government generated material has kept Google from fully releasing these materials for viewing, fearful of providing another target for rights-related litigation. Spurning this concern as misplaced and ill-informed of the law, Michigan must be commended for their stance.

In consideration of these issues, a colleague at another large research university said of our community [n.b.: minor edits for comprehension]:

There was an ARL Directors meeting in Phoenix where they deliberated on a SPARC-related proposal to digitize government documents, complete with a business model. All of the brightest directors and our best friends had plenty of reasons why we should not do such a thing, and it went down in flames. It was a no-brainer, obviously, but pre-Google Book Search. 

Several of those folks have now changed their tune. More importantly, I think Brewster Kahle has gone more vigorously after things like the government documents in part because of Google, and people take him more seriously because of Google's digitization. We might actually get someplace.

Perhaps my friend is right.

And pointedly, that (oblique) optimism is in the context of suspecting that the terms of any settlement between Google, publishers, and authors, should it be consummated, will be good for no one except those entities, on the basis of its own terms, particularly considered that such an agreement could conceivably advantage the public to a far greater extent than is likely to be the case.

Fundamentally, as Tim O'Reilly and Anil Dash have suggested, one has no choice but to suspect all that lies beneath GBS, which will, inevitably, serve its creators before and ahead of its peer book repositories elsewhere. Book search will not work like web search. And I worry -- as the precedent of government documents sadly argues -- that libraries may not have the strength or sense of purpose to stand up for any other than their own individual interests, neglecting the broader communities they serve.

In the beginning, at least. But in an off-handed way, like my friend, I sing praise to Google. For if our aims are in the near term defeated, if we fail to stand for ourselves, then through Google may finally rest an important redemption.

We must learn to trade for our own account - not the account of Google, Elsevier, the AAP, or the Authors' Guild. We must acquire, and build, a shared universe of information, freely available to all, on our terms. We must stand together for all we profess, against all danger -- stand for what no other organization in this world can: the fundamental right of access to information, and the compulsion to preserve it for future generations. This is not an economic imperative; nor will it ever be the goal of an internet advertising company. It is a mission that defines libraries.

We must not fail to shy from the fleetingly seductive "opportunity of a lifetime" motivations presented by Google and others who can only, ultimately, cultivate their fields in the service of an economy that prices their shares like so many toy balloons, with no bounds to their inflation, directed in a willy-nilly fashion, and exposed to an easy collapse after joyful hands of rough play have their say. Those hands are not ours. Those hands trade for their own account.

With our hands: let us trade for our own.

Peter Brantley, Executive Director, Digital Library Federation

2008:01:01

January 2, 2008  | Categories: MassBooks, DLF, DigLibs, Universities

This is the personal blog of Peter Brantley, and the opinions expressed here are his own and are not reflective of any of his employers in the continuum of history, or the University of California, which provides support for this blog.

Recent Posts

Search

Categories

Subscribe

  • RSS
  • Bloglines
  • MyYahoo!
  • MyMSN
  • Newsgator
  • Google Feeds
How to subscribe
powered by
b2evolution
Join EFF Today