Several threads came together for me today in an undoubtedly useless thread, but I'll spin it out anyway. What's a blog for?
The various components are: Dave Winer's musing that there should be a generalized service for people to preserve their own content; thinking back to my work at the California Digital Library, where I hooked up CDL's Preservation Program with folks at Amazon's S3; a blog on Amazon's Elastic Compute Cloud (EC2) and its impact on something or other that I've completely forgotten; a conversation today with someone from UIUC who was talking about a model for repository (digital object archive) interoperation that I found myself not agreeing with philosophically, based on admittedly inadequate information; an email list proferring that Google may offer file-based storage for Google account users; an excellent review of what made YouTube successful as a service; and a touch of rye whiskey.
So at the end of this windy introduction, onto my evolving set of basic thoughts about preservation, and repositories.
- I do not believe that libraries lend sufficient additional value to interpose themselves in the process of digital curation, compared to the individual/scholar who has either created or alternatively gathered the content, and is in the best position to evaluate its saving. Although libraries may well initiate very high value preservation acts, they should act in this fashion as privileged individuals, and not monopolize act of preservation.
- Preservation should be initiated and controlled by the content owner or rights-validated gatherer, and content objects should not be gated or judged for exclusion except on the grossest factors (such as legality, rights issues, and so forth). Tagging, however, should be open to anyone within the eligible community. (Why heck, any god-fearing provider is able to take advantage of safe-harbor under DMCA.)
- I think end-user facing services are far superior to institutionally-arbitrated ones. Further, consumer-facing services that live at the level of the network, not within the bounds of an institution, are likely to have more traction and more visibility. In other words, I think net-based apps can gain sufficient scale to make consumer-facing or community-facing applications superior to institutionally held ones across a range of critical factors such as performance and utility.
- I think that academia is different enough from the general population's needs regarding storage in terms of content description, sharing, rights, intent of purpose, and potential or latent value to warrant their own community-based (higher-education) solution, which may not necessarily be distinct in form to a broader-based application.
- I think it is quite sub-optimal (a nicer way of saying that something is stupid) for universities to continue creating their own individual, branded respositories when they don't talk to each other very well, and scaling is generally limited to the capabilities of the institution. Among other things, a continuing bias towards institutional solutions creates gross inequities in support capacity across the diversity of universities, in their size, focus, and aspirations.
- I think P2P-based preservation strategies tend to possess significant technical administrative overhead to maintain peer-level coordination and cache consistency, compared to centrally-coordinated distributed solutions, and are probably at best an interim solution.
- The availability and costs of network infrastructure, and our understanding of how to enact services across it, has advanced to the point that applications can scale easily without requiring the federation of institutional deployments.
So what I would propose is that a consortia of universities, perhaps led by their libraries, initiates adequate development to enable the deployment of a clustered instance of MIT's DSpace open-source respository system instantiated on Amazon's EC2 infrastructure. This network-based, community-oriented preservation repository would be directly available to all HE end-users with minimal gating and minimal content review. Obviously an adequate higher educational governance model would have to be created, and a means for adequate remuneration of the effort, which might range from institutional membership fees, or tiered service offerings which could be rescinded in case of payment lapse, obviating the development of free-riders.
Sure, this is probably silly, and it might not work. But it is no more silly than individual instances of DSpace or any other repository software; it solves the problem of repository inter-operation; it takes advantage of network-based positive feedback; it binds universities together in common purpose in a big way; it benefits harvestability and therefore content discovery through its aggregation; and it increases content visibility.
Google could do this themselves by gussying up Google Base and then bundling it as part of Google Apps for Higher Ed.
Maybe we should try it first?