| « Who preserves? | Saving on a cloud » |
Comments to comments, as part of that infrastructure is not well behaved, so here I am.
Dorothea,
There are some huge administrative issues. Many of them exist as well at the institutional level, but solutions would be necessarily different in construction at the consortial HE level that I proposed. I don't have a full suite of answers here, but clearly there have to be operational as well as governance and coordinating roles assigned to responsible parties.
Re: Amazon. I think the larger question is again, would I trust any one provider without contractual guarantees? A supercomputer center? Not likely. A large university? Not necessarily. So for any of these solutions, or their combination, one would have to seek SLAs and strategies for ensuring content viability, etc.
Re: DSpace v. else. It could be Fedora. It could be CDL's Common Framework repository. But it has to be something, and not many things, or you lose a lot of the goodness of scale. HE tries to hard to accommodate distinctive advantages, at the loss of overall benefit. I chose DSpace in my example, but whatever it is, we need to ride one horse and not a stable. And if we did something like this, let's not spend a lot of time doing studies.
I agree with you on ORE. And, I think realistically there will always be any number of institutional or even department scaled repositories. That's probably an advantageous thing, and ORE and whatever else DLF's Aquifer can test and verify should also be supported. I don't see it as black and white, one ring to rule them all, but otoh, I think we have to have that one ring as a strong and blessed option.
Jerry,
I used my community/individual terms somewhat sloppily, but I was thinking that the community served was some bounded subset of HE, like (I dunno) R1 institutions, and the service would be open to eligible users within them. Of course we get into this huge thing about what is an eligible user, but let's say that this is institutionally determined for now.
I think ideally this system would benefit from being flexible enough that sub- communities within this larger community - which you are right, would most likely be a consortium in organizational form - would create their own local admin superstructures. Maybe the plant geneticists want to do something special, for example.
There are clearly issues with types of data - again, not necessarily a unique problem to this model. In fact, this was pointed out to me once when I presented an earlier version of this at SciFoo at Google. The biggest problem, which would absolutely require gating, is the submission of out-of-scale files. Most repositories are not going to handle petabyte files, nor would our networks, yet; additionally, a trillion 4k files are probably not happening with most storage systems, either. Not to be flip, but these are technical issues that have to be administratively handled.
The incentive problem is a tricky one, and in part I hope that scale, and clever design, would offer some positive feedback on use. The ability to have a central store, with tagging, alerts, persistent linking, and other good things would I hope create an economy of services that would drive additional business. This is one area where I think aggregation trumps many repositories not well coupled.
Raymond,
DLF would be happy to participate, but we need someone interested enough to handle the engineering, etc.
Server manager: contact