Deindustrialization and Responsibility
Recently, I had the delightful opportunity to talk about digital preservation with Cathy Marshall of Microsoft Research. It was a conversation made odd by the fact that we met in Redmond, although we live and work on opposite sides of the Bay Area. Sometimes it takes an airline.
Cathy has been writing with great insight and poignancy on strategies used for personal archiving, with some early, proffered guidance on the ramifications for service designers and developers.
Her paper, and a separate presentation, made me think for the first time about the dispersal of copies, and the lazy archiving that people assume the open web provides them. It also suggested a philosophical divide between individual and institutional archiving that has ramifications not only for how services are developed, but also the obligations owed to the larger community, extending beyond immediate users.
One of the things not extensively discussed in the library and archival communities, yet which is increasingly implicit in the larger digital environment, is the possibility that even if an original, source object is not re-locatable or re-discoverable -- "Where the hell is that paper on Xanadu?" -- often an acceptable surrogate can be found. Further, it is increasingly likely that there is no real "original", but rather a stream of mildly differing replicas. These are surrogates of type, and form.
On Digital Surrogates of Type and Form.
Surrogates can be of type, or of form. A surrogate of type: imagine that I want a picture of young people with cell phones in Japan. I know that Joi Ito has a terrific picture called "Generation Gap" on Flickr (I use it all the time). If that picture were not available to me, I could find a good-enough surrogate. Similarly, if I wanted a picture of Izumo Shrine in Japan, I could search in a variety of places and find an acceptable image. Perhaps one is more striking than another, or more poignant, but in a universe of a growing numbers of digital objects, it is more likely than ever before that one thing will do about as well as another - and that is a new phenomenon. If not a, then b.
A surrogate of form: a published article is not available, but a preprint is, or another version on the author's web site. They are not identical, but they are close enough. In any given body of work -- increasingly as digital expression grows and penetrates into more of our lives -- a growing chain of similar executions, slow evolutions of thought, develops, overtaking an older style of low-frequency, more punctuated, creative output.
It is possible that the historical sequence of a scholar's production of a single book every five years, is now instead those books, plus blogging, plus presentations at conferences, plus additional editorial work -- all of which is far more inherently discoverable than ever before. (This is true, I suspect, for both humanists and scientists, in varying ways). The existence of a larger body of work permits the use of surrogates of form. The boundary of an individual's or organization's body of work expands as digital artifacts become more prone to a leaving, than to a deletion. If not a-a', then a-b' may be fine.
I suspect further that our self-awareness of this trend actually suggests a positive-feedback loop. Knowing that more of my work is publicly accessible, I am less likely to tend it into a tidy English garden until it is "just so" -- rather I am more susceptible to attempts at seizing the passing fame of iterative publishing, engaging in conversation rather than a more periodic conceptualization of content creation. A river of blogs, papers, presentations. Where does one thought conclude? Where does one begin?
I think these conceptions of surrogates provide perspective on traditional expectations for digital-age preservation. For the society, a successful digital preservation strategy should be about ensuring the availability of a useful set of surrogates.
De-industrializing preservation.
How preservation "happens" is something worth the expenditure of additional, open-minded contemplation. We should consider what we expect from strategies pursuing preservation, and what we expect from those who are providing the valuable content-based services we use on a daily basis. Challenge the gods we have placed in our own temples.
There are many who feel that a high priority for libraries should be placed in preserving or archiving as much material, and distributing the resulting cache in as many locations, as possible. But I wander if that is where our effort should be placed. Certainly some forms of material must be retained by libraries and archives -- unique artifacts, datasets with potentially enduring value, and many other created or sensed things -- but when one considers all the matter that is not being proactively preserved -- how much of it suggests a forced march into preservation archives by libraries who can barely afford their own very modest digital efforts?
Arguably, supporting the most effective broad archive currently available -- the Internet Archive -- and fostering its on-going health and maintenance, should be the primary goal of library-based general internet preservation efforts. Not perfect, but it works, well enough. Special niches deserve more of libraries' focused efforts, but it is beyond their means to distribute copies of Flickr across existing or anticipated archival systems, and the imbalance between data creation and primary storage grows with every passing day.
For the general-purpose (here I partition out significant portions of the academic research and government web), might it not be at least as worthwhile, and likely more sustainable, to simply and explicitly encourage good men and true to do the right thing?
As a society, at a public level, we could mandate that Microsoft/ Yahoo/ Flickr generate, publicize, and support the audit of their own archival systems -- systems that trigger enforced public access under conditions of violation or negligence. We are not forbidden from entertaining the premise that this is partially a corporate responsibility, rather than accepting it as an onus upon academia and the government. It is a payment for the privilege of corporate citizenship, which is itself not a natural right, instead of solely an obligation to be enforced on academic systems that have long since been outpaced in their capacities.
When I studied the de-industrialization of steel in the Northeast and Midwest, it was obvious that whole communities were devastated as a result of the ramifications of corporate departures on social and human services -- disruptions that were never incorporated into their balance sheets. Closing down plants is more attractive when you do not have to pay property taxes, provide unemployment, or pay Police and Fire, or maintain utilities serving abandoned neighborhoods, or support local stores and shops, or provide mental health counseling, or educate the children of your newly unemployed. For corporations, the communities that give their life-blood become mere externalities.
In our digital age, we make a ridiculous assumption that public responsibility cannot be layered onto private entities. But digital images and videos are the assets of the day, as much as the converters, blast furnaces, and rolling mills were of the 1950s and 1960s.
Preservation is not a corporate externality, and forcing preservation to be, by default, the responsibility of the individual or the public sector is an act of profit-maximizing at the expense of the larger society. We should propose legislation that requires openly-accessible content-holding firms of a certain size -- say, with more than half a million registered users -- to demarcate a reserve fund, and designate a suitable non-profit content beneficiary in case of insolvency or the cessation of business. It is not too much for us to ask. To require.
We can hear the stories of Buffalo, Lackawanna, Bethlehem, Youngstown. We can work with higher empathy, and strive for greater action, in our new age of digital steel.
I've been thinking a lot recently about the availability of books in online searchable repositories, and the likely outcomes for publishers, libraries, and the public. Most particularly, I have been considering the impact of a possible settlement between publishers, authors, and Google involving the books that are currently under litigation in the Google Book Search product.
A significant portion of the implicated works are likely to be out-of-print, of uncertain copyright status, and no longer present in any publisher's archive -- available only in the less-visited shelves of the largest research libraries. This substantial category, numbering in the millions of books, incorporates a large number of what are called "orphan works", where the presence of any identifiable copyright owner in the work, or its constituent parts, is not known, and resilient to easy resolution as a result of poorly recorded mergers and acquisitions, lost archival contracts, publisher insolvency, and myriad other reasons. In turn, some of this orphan material is almost certainly public domain; the original copyright never renewed, and long since expired.
What might break the logjam of access to these works, and frustrate the otherwise inevitable near monopoly of access that Google might obtain through a court proposed settlement? A digitization agreement involving universities and a suitable hosting service that would make this lost material broadly available on reasonable terms, with clear benefits that facilitate research and education, would make a strong counterpoint.
The content could be made available through various monetization arrangements, including subscription based individual access that supported features such as print on demand or digital lending, and licensed access with payment tiers for universities, high school libraries, and similar institutions, which might also be willing to pay a premium for local-hosting options. (If this material was provided through a charitable non-profit organization, hosting fees could be quite low). Alternative arrangements, such as those pursued by the high-energy physics community's SCOAP3 journals project, might also be feasible, depending on the topology of interested parties.
A portion of fees could be escrowed in a common fund for allocation to rights holders should any come forth with the necessary proof of copyright retention. A basic access level to orphans and proven public domain books, sans any advanced features, could be extended to currently registered card holders of public libraries as a free public service (this would have the secondary benefit of driving use of a trusted OpenID through library participation at a community level).
Books that have newly apparent IP holders could be taken down through a simple, authenticated request mechanism, or alternatively retained in the delivery system with a different share of income returned to the identified author and/or corporate parties. The escrow fund would provide a modest, yet reasonable compensation for the works' past use, partially offset by the virtue of the hosting service's implicit discovery fee. Easily accessed lists of available works, e.g., through publication of OpenSearch RSS feeds, would assist possible copyright owners in finding bereft works; transparency would increase trust for all parties.
What might be most challenging would be identifying an appropriate body of content that would be both coherent and compelling; that would include a significant enough number of out-of-print orphans to be useful; and where large libraries might hold sufficient numbers of these books to be able to mobilize for their digitization. Perhaps a subject with an accumulation of desirable material might best meet these parameters: e.g., works of U.S. history, or autobiographies, or American literature. Alternatively, a discipline with a long history, such as anthropology or economics, might embolden a tribe of scholars and interested amateurs to make organization for online access compelling.
Cry out then for participation in such an effort! Nothing prevents us for crafting something that works for many parties, not just one, and yet clearly benefits the richest goals that any public must have for itself -- learning and inquiry into the most fundamental matters of our time, and ourselves.