A few days ago (December 27), Public.Resource.Org and the Internet Archive, in conjunction with the Boston Public Library, announced a landmark agreement to scan a treasure trove of U.S. Congressional documents, with the aim of making them available to the public.
The Internet Archive, Public.Resource.Org, and the Boston Public Library today announced a plan to scan more than 60 million pages of these government documents over the next two years, with plans to make them freely available in perpetuity.
The first part of the program, which is currently funded, will scan fifty years' worth of Congressional hearings (1936-1986) held by the Boston Public Library. The scanning will be done using special scan stations developed by the Internet Archive, and should be complete within the year.
Assuming that all goes well and more funding can be found, the project will then move on to phase two: scanning 60 million pages from sources like the Congressional Record and the Federal Register.
The New York Times provides more details:
The project is the brainchild of founders of the two organizations, Carl Malamud and Brewster Kahle, and it is initially being financed by a $250,000 grant from a foundation established by Mr. Kahle and his wife, Mary Austin, and a matching grant from the Omidyar Network, a support organization created by Pierre Omidyar, the founder of eBay.
Mr. Malamud said his goal is to digitize the entire United States government documents collection, which has been estimated to include up to 100 million pages of publications ranging from the Congressional Record to the Federal Register.
This is a critical accomplishment because these data have been hitherto available only via very nice, full featured, but expensive licensed products from companies like CIS, Elsevier, and others.
Many of these same Congressional documents (e.g., Congressional hearings) have been scanned by the Google Book Search project through their Library partners, but in the majority of cases, Google has opted to not treat these materials as public domain. Because Congressional hearings may have embedded materials, Google has taken an extraordinarily rights sensitive approach and kept these materials from being freely and fully accessible to the public.
As director of the Digital Library Federation, I have been in conversation with a few of the libraries who are lead participants in the Google Book Search library program. One of these libraries, the University of Michigan, earlier made a determination that the congressional hearings digitized by Google, where Michigan has obtained a scanned copy from the Google Book Search effort, should be made available to the public.
Michigan held repeated conversations with the Government Printing Office and reached the decision that they should make these data available. Arguably, anything submitted to a Congressional hearing as testimony is, by definition, a work of the government. Many court precedents support the use of legal and regulatory information in the face of copyright claims.
Michigan, a strong ally of Google in their scanning effort, has been a notable leader in the defense of public rights of access to the greatest amount of material available.
John P. Wilkin, the Associate University Librarian, Library Information Technology, at the University of Michigan, tells me in email:
There are now more than 1m volumes online locally, and because the first cut came from the storage facility where there are many hearings, there must be many online. I did a quick search [of Mirlyn] ("michigan digitization project hearing") and found a little over 3,000 [documents]. [There are sure] to be a few false drops and some misses in that, but glancing through, things look on target.
My email thread with John P. Wilkin and Paul Courant, the University Librarian, at the University of Michigan concerning the comprehensive effort described by the IA and BPL was immediately productive. With their sanction, I can announce that Michigan has pledged to re-host this collection of Congressional materials, providing a second home and a second point of access. In turn, upon notice of the Michigan decision, Public.Resource.Org immediately affirmed that they would be openly hosting a copy of this material as it becomes available and merging it in with as many government information sources as possible.
This is what libraries do best, and why they are special: they defend public interests; they make information widely available; and they ensure its permanence. These characteristics are notably distinct from the obligations and concerns characteristic of Google's Book Search project, which has made its priorities obvious through their decision to closet many of these materials.
Michigan, in fact, has noted that they have an open invitation to groups digitizing content: if the content is germane to the University's collections and if they are willing to give Michigan a perpetual non-exclusive right, Michigan will make a no-cost long-term commitment to hosting the material.
Kudus to all parties involved for helping make public information more widely available.