« Book Search as a productThe ISBN as SKU »

Books for a Digital Nation


Books for a Digital Nation


When I was in New York, with the support of the DLF, I was able to bring together various organizations to discuss the creation of a digital book deposit program in support of the U.S. Library of Congress

The motivation for this effort is simple: currently, publishers must provide copies of published works to the Copyright Office (CO) for the use of the Library of Congress, thus serving not only as the Congress' library, but preserving a record of the published literature in the United States.  (§ 407: "The required copies or phonorecords shall be deposited in the Copyright Office for the use or disposition of the Library of Congress.").

Although books are increasingly digital, the Library of  Congress privileges print as the "best edition" for preservation purposes.  The best edition determination principles described in the Copyright Office’s Circular on Best Edition allow some flexibility, but it is difficult to apply these guidelines on a case-by-case basis should a digital edition exist.

Contemporary copyright law unfortunately produces myriad pathological consequences, and the Library of Congress is not immune.  The Library obtains its copies of print books without hindrance on their use.  (§ 704: "In the case of published works, all copies, phonorecords, and identifying material deposited are available to the Library of Congress for its collections, or for exchange or transfer to any other library.")

But this does not hold for electronic deposits of digital books.  Should a digital book be in receipt of the Copyright Office, the CO believes that transmission of that book to the Library itself would create a faithful copy, and therefore might be considered infringing.   For the Library to collect digital books, it must buy or lease access to them.  This is a prohibitive consequence for collections development for the nation's Library; for the original cataloguing of books; and for the ability of Congress to obtain the information it needs to conduct the business for which it is chartered under our Constitution.

The format schism between digital and analog (print) portends a significant blow to the Library, with ramifications at a national level.  In order to preserve our heritage, we must architect new systems that will allow us to archive a very wide range of digital content, and books are becoming central in the digital domain.   This is ultimately a responsibility for all of us to share -- we are all citizens of this country.   More instrumentally for publishers, a digital archive would provide an assured backup source in the event of cataclysmic loss of their primary intellectual property assets.  

With the help and facilitation of Ed McCoyd, the Director of Digital Policy for the American Association of Publishers (AAP), I was able to present this issue in person to a meeting of the AAP's Digital Issues Working Group; we had an excellent discussion with publishers, and there was comprehension of the problem and general support for finding a mutually acceptable solution.  Ed afterwards conveyed:

Thank you for speaking at the AAP Digital Issues Working Group meeting on Wednesday [June 11], regarding your interest in bringing parties together to develop digital book archives for preservation.   Your points about preserving cultural patrimony, and assuring permanent access by libraries and other digital content customers as well as by the publishers themselves, certainly resonated with the group.  It was also very helpful for them to hear about the successful project at Portico involving journals, and Portico’s successful tests with submitted EPUB packages.

I look forward to talking further about your initiative in the coming months, and will keep the publishers apprised of the additional details as they develop.  Thanks once again.


That afternoon, I was able to explore these issues in more depth at a small meeting held at the offices of Ithaka in Manhattan; participants included Deanna Marcum, the Associate Librarian of Congress; Don Waters, program officer at the Andrew W. Mellon Foundation; Mike Shatzkin, of the Idea Logical Company; Todd Carpenter, the director of NISO; Mike Smith, the director of the IDPF; Eileen Fenton, Evan Owens, and Toni Tracy of Portico1; and Karen Forster, of the BISG.  Although multiple senior publisher representatives were invited and indicated strong interest in the issue, none attended.  Nonetheless, we were able to begin a useful discussion of the contours of the problem, and how it might be resolved.  

The tentative framework we developed was to initiate a trial project with a small set of publishers who would deposit sample digital books into the Portico repository; such a pilot would inform business, legal, and policy issues. Portico has already successfully demonstrated ingest of a wide variety of ebook formats, including EPUB.   

Eileen Fenton, the Director of Portico, suggested:

Portico [would serve] as the preservation repository for e-books from trade publishers, with national libraries able to point to the Portico archive as a means to fulfill legal deposit (or copyright registration) obligations.  This arrangement could alleviate some of the significant pressure that national libraries face on the digital preservation front, freeing resources to focus on preservation of digital materials outside of Portico's scope.  This arrangement could also benefit all by enabling (potentially significant) economies of scale. This vision, of course, does raise a number of rights and business model questions, but exploratory discussions ... are highly desirable at this stage.


This is a good consideration.  Further, we are at a fortunate moment when there is a rapid convergence around a small set of standards for ebook formatting2, primarily variants of Adobe's PDF, most often accompanied by ONIX metadata, and the IDPF's EPUB, which recently received AAP's blessing [pdf] as a desirable digital distribution format that publishers should coalesce around.  The EPUB format also has the great advantage for rapid ingest that it presents a manifest of the package's contents; the propensity of journal publishers to create myriad files for discrete components of individual articles without an associated master roster greatly impedes processing by Portico and other archives.   

From the publishers' perspective, having a single established, well-trusted repository supported by prominent journal publishers and libraries provides assurance that the deposited material will be tended carefully and with respect.  (Ultimately, it might be possible to craft a redundant, distributed international archive across national library systems, but this is fraught with complex sovereignty, intellectual property, and other legal and policy issues that we are not yet capable of resolving.)

In the meantime, we will be considering how to most appropriately establish and frame our next conversation, one antecedent to the drafting of an actual pilot to advance our efforts.  

 



  1. Portico is a not-for-profit that received its startup funding from the Library of Congress and the Mellon Foundation; it is formed to represent a partnership of many large and small journal publishers and libraries, preserving a rapidly growing volume of digital journal articles with high reliability in a rights-respecting, secure archive.  Serving as an escrow, access to the archive by its members is triggered only under carefully-defined and contractually-specified circumstances; e.g., termination of a publisher's business operations or a catastrophic loss of its repository infrastructure. 
  2. Evan Owens, the CTO of Portico, classifies "found in the wild" ebook formats into four categories, defined by their character and provenance:
  • E-books : IDPF EPUB, or other "packaged" formats
  • H-books : books formatted in X/HTML
  • J-books : books from a journals publishing platform
  • D-books : books digitized from print editions

Finis.

 

Jun 16, 2008 | Categories: DLF, eBooks, Libraries, Preservation, BookRights | pbrantley

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
7 + 3= ?
antispam test
This is the personal blog of Peter Brantley, and the opinions expressed here are his own and are not reflective of any of his employers in the continuum of history, or the University of California, which provides support for this blog.

Join EFF today

Recent Posts

Search

Subscribe

  • RSS
  • Bloglines
  • MyYahoo!
  • MyMSN
  • Newsgator
  • Google Feeds
How to subscribe
powered by free blog software

Server manager: contact