« Opening content, but for useGetting it out together »

Building an OpenStack


Building an OpenStack


Over the last several years, there has been increasing discomfort among those of us pursuing increased interoperability of library-domain systems, including foundational elements such as content exchange and interlinking between repositories, storage redundancy, preservation architectures, and distributed content systems. Much of the pursuits have oriented toward relatively complex new standards for content description and exchange, with the hope that sufficiently well described bundles of “assets” will carry enough information along with themselves, like hand-carried metadata luggage, to enable relevant actions to be performed on the receiving side of the transaction.

At its peak, this tendency has spawned conceptions of deeply intertwined systems whose interoperation is preemptively well defined through service oriented architectures (SOA), rather than obtained on an ad-hoc basis through existing lightweight open protocols which inevitably leave much to chance..

SOA architectural approaches are exceedingly expensive on many levels: technical, because they require close attention to program function definition and design; managerial, because controlling uniformity at the expense of local optimization inevitably fails to please customer bases; and temporal, because SOAs are often baroque and require lengthy periods to draft and obtain consensus – and consensus is necessary by definition when one is attempting to impose unanimity across whole continents of scholastic domain, such as the Humanities.  SOAs might succeed if they are contained within the walled gardens of corporate infrastructures with tight command and control, but across heterogeneous and distributed environments such as the open net, they are almost certain to meet Death on someone else’s terms.

About six months ago, in a hallway conversation at the last DLF Forum with principals from the DSpace Foundation (Michelle Kimpton) and Fedora Commons (Sandy Payette, at Cornell Univ.), we strove to articulate a new expectation for interoperability.  Sandy and Michelle made the critical observation that it should be possible, normatively, for the majority of the applications in the information access community (across libraries, museums, research institutes, and information systems departments) to interoperate easily, without explicit pre-definition.  This would permit a user, whether human or machine, to reach from the desktop to the highest cloud storage systems with ease, without a reliance on complex APIs crafted to support bilateral agreements.  Zotero should work with Fedora; Fedora with DuraSpace or with Amazon S3; Hathi should accept content ingest requests from the Participatory Culture Foundation; ARTStor users should be able to push into their Flickr accounts.  And so forth.

The most appealing thing about this vision is that there is no barrier towards its construction except our own willingness to make it happen.  As I played with this insight, what began to make the most sense was a resolution modeled on the CapeTown Open Education Declaration, where individuals and organizations commit to spur the growth of open education initiatives and the availability of open content to benefit education across the globe.  The Declaration defines itself as "a statement of strategy and a statement of commitment. It is meant to spark dialogue, to inspire action and to help the open education movement grow."  And that is exactly what I think we need.

We have, across our broad communities seeking enhanced access to information, a similar opportunity and challenge, which I later christened OpenStack in a starck absence of original eloquence.  (N.B.: There are several other extant "OpenStacks", all generally in the same spirit; e.g., the DataPortability Project has made use of the term; there is an active blog bearing the name as well).  I have not had the necessary epiphany to draft a concrete OpenStack declaration; I appeal to parties with more wit and better grace of pen to lend themselves toward this effort.

The vision of the OpenStack concept is intended to be bracingly modest, achievable, and empowering.  In its essence, it is the acceptance of the responsibility on the part of application developers to facilitate inter-operation through simple, lightweight protocols and API specifications that enable compatibility with the information services and components that range across their greater community.

One does not need necessarily to know what these systems are; in fact, we must assume that we will not have that intelligence.  The elements of common sense and vision play a significant guiding role in OpenStack, combating the dogma of over-specification.  If an application works with discrete content items, then it is reasonable to assume that an API to access those content items under the most liberal license terms possible would enable other content use systems to build unimagined applications; I take the example of the Brooklyn Museum’s recently defined API and its endorsement of Creative Commons licenses.  If one is alternatively drafting a new repository architecture, it is reasonable to assume that permitting content to be pushed into an ingest stream through Atom Pub is fundamental.  And so forth.

This is OpenStack.  Its patrimony is an enhanced access to information: the generation of new ways of exploring and participating in our world by encouraging the combination of the best of OpenCore and open source solutions.  OpenStack is an antidote to the excesses of SOA designs; it embraces what the web does best, and it is parsimonious and economical.  OpenStack rewards flexibility, speed, and good-enough solutions that are comprehensible and permit rapid evolution.

However we outline this declaration, let us pledge to use open, public standards; publish open and straightforward APIs with liberal license terms; relax our requirements for the preservation of provenance and exactitudes.  Pledging to loosely assemble our many rough applications, we enable everyone –application crafts workers as well as industrial software titans – to fashion an infinite number of coats.

Mar 08, 2009 | Categories: DigLibs, Libraries, Universities, Publishers | pbrantley

4 comments

Comment from: Ryan Shaw [Visitor] Email · http://aeshin.org/
I think you can go even more radically lightweight than this. No APIs: just put your data in publicly accessible webspace. Push for some loose consensus around naming (a la your /public proposal) and formats: SQL dumps as lowest common denominator, then Atom XML, then RDF Linked Data. Bulk download and local caching: look at the success Wikipedia, Geonames, etc. have had with this model. No new APIs to learn, no arguing over how to model REST resources, just download and go... you can always add on APIs later.
03/08/09 @ 20:16
Comment from: Peter Keane [Visitor] · http://blogs.law.harvard.edu/pkeane
Peter-

Great to see this posting -- I think this is exactly the direction we should be moving in.

I would make two points, though. One, I think that the library world must come to grips with the fact that its problem space and use cases are not unique in the world of distributed information systems. It follows that the best resources/approaches for distributed systems will likewise come from outside the library community.

My second point follows on the first, and that is that for systems deployed on the web using HTTP (I assume that's what we are talking about), the architectural principles articulated in RESTful design are essential. I fear that REST is considered "one approach among many options," when in fact it is simply an articulation of the forces at play when building distributed systems on the web. RESTfully designed systems will be more scalable, flexible and (most importantly here) interoperable than systems not adhering to the principles.

REST is meant to be "simple" but by no means "easy." It's a hard-won simplicity (especially since so many systems now in use are inherently not RESTful) that requires very thorough and careful planning and design. Folks in the REST worls are becoming wary of the term "API" since it suggests (and often signals) a design based on unique formats, documented URL construction, etc. Good iteroperable REST-based services will be characterized by a reliance on standard, well-documented media types (mime-types) and reliance on a well-understood link relationships. Services that go beyond that risk being less RESTful (and thus less scalable, flexible, and interoperable).

Two v. useful blog postings:
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
http://www.infoq.com/articles/subbu-allamaraju-rest



03/08/09 @ 21:15
Comment from: Ed Summers [Visitor] · http://inkdroid.org/
While I agree in principle with this notion of an OpenStack, I think that the Brooklyn Museum API is really just an example of yet-another web2.0 API.

Don't get me wrong: web2.0 APIs are great, especially for making data assets available to interested parties. But the reality is that there are similar APIs, all subtly different all over the web. How is a tool like Zotero, or Fedora, Dspace, etc going to interoperate with them all? The answer is they can't unless they deal with each on a case by case basis.

Is OpenStack an attempt to name a particular pattern of API use on the web? For example why didn't Brooklyn Museum implement OpenSearch? It would've been a whole lot easier than divining their own API. If they had users could've added a search box to their Firefox or IE browser, without having to know anything about the API.

Personally I would have preferred to see Brooklyn Museum do some simple things, like using the rel="tag" microformat on their item tags, exposing the JSON/XML records for items directly in the item displays using link elements, using oAuth for the authorization, implementing OpenSearch.

On the other hand their use of creative-commons is a good example of something that is important I think...following an existing pattern of usage on the world-wide-web.

I think it is really important that the Library/Archives/Museum community identify these patterns of use on the larger web, instead of recreating them and ending up in a niche/ghetto. I guess that's what you are saying we need? I don't think that gesturing at something and calling it OpenStack is enough though.
03/09/09 @ 06:46
Comment from: Peter Brantley [Visitor]
Thanks, everyone, for these comments. They are all somewhat similar, and I take heed of their message. I was using "API" rather loosely -- too loosely -- and should have been more precise. I did not intend API to necessarily imply a custom/baroque interface; API can be implemented simply and in an easily generalizable and well-described fashion; of course, the prevailing theme of just pushing, or exposing data, publicly in a well understood format is the best means of inducing the kind of interoperability that I am seeking. For example, the Guardian UK's new Open Platform API enables text search and data presentation through XML, JSON, and the ATOM format (specifically).

This kind of data-first principle works well when institutions feel comfortable with exposing either direct content or descriptive metadata sufficient to enable new services definitions. Of course not all content is fully open, and we will sometimes need to imbed logic relating to permissable actions within metadata, or provide an explicit guiding API with a published specification.

I agree that gesturing at things is not enough, but sometimes it is a necessary start. (As in, "Yo, over *There*!")

03/09/09 @ 18:22

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
7 + 3= ?
antispam test
This is the personal blog of Peter Brantley, and the opinions expressed here are his own and are not reflective of any of his employers in the continuum of history, or the University of California, which provides support for this blog.

Join EFF today

Recent Posts

Search

Subscribe

  • RSS
  • Bloglines
  • MyYahoo!
  • MyMSN
  • Newsgator
  • Google Feeds
How to subscribe
powered by b2evolution free blog software

Server manager: contact