Posts Tagged 'Open Archives Initiative'

What is OAI and why should you care?

The Open Archives Initiative, commonly abbreviated as OAI, is a “body that promotes standards in archiving which allow systems to operate successfully together and exchange information.” (Dictionary of Information and Library Management) Founded by Michael Nelson, Carl Lagoze, and Herbert Van de Sompel, the OAI first launched in October of 1999. The Initiative’s funding comes from the Andrew W. Mellon Foundation, the Coalition for Networked Information, Digital Library Federation, Microsoft Corporation, and the National Science Foundation. (Reitz, Open Archives Initiative) Since its inception, the Initiative has been linked conceptually to the idea of open access in scholarly publishing and the creation of institutional repositories. For this reason it has garnered considerable support from information and library science professionals. However, increasingly effective, mainstream alternatives to the OAI’s framework have resulted in published authors not participating in the Initiative. As well, vendors may be dissuaded from participating in OAI repositories due to digital copyright concerns.

Although the above definition for OAI is good for a general understanding of the Initiative, it does not provide a comprehensive understanding of the topic. At its core, the OAI promotes interoperability between different systems by supplying a rigorous set of standards that facilitate the sharing of digital information. While the name OAI suggests the idea of a permanent repository for digital or digitized media, the word “archive” is in fact used as a synonym for “e-prints”. (Reitz) The two definitions of archive should not be confused, because while the former is focused on artifacts and artifactual value, the latter’s goal is to increase scholarly communication by creating easier digital access to research. Although the two are not mutually exclusive, this distinction is vital to properly understanding the OAI.

The OAI’s main vehicle for increasing scholarly communication is OAI-PMH, or the Open Archives Initiative Protocol for Metadata Harvesting. (Metadata is, at its most basic, information about information.) Metadata harvesters locate and aggregate metadata from different data sets. While this at first may seem confusing, there are many easy-to-understand examples. If I wanted to integrate MBooks, Google Books, and CMU’s Million Book Project into my own catalogue, all I would have to do is “point” my metadata harvester at their repositories to gather the information. With the OAI, Dublin Core is usually used as the scheme for metadata. Afterward, when searching my OPAC, users would be able to locate the metadata – in record form – within my catalogue and be linked to the content at the original repository’s site. This is impressive not only because it can widely diversify an institution’s collection, but also because none of the files (besides metadata) have to be stored locally. The largest OAI catalogue is OAIster, which provides access to approximately 15 ½ million records. (University of Michigan) The search engine indexes material from over 900 contributors. (University of Michigan)

I am skeptical, however, of OAI-PMH. First of all, acquiring metadata is not as simple as pointing your harvester at an information source. Permission must be asked and granted, and there is no guarantee that more contemporary and valuable information will come without a fee. The majority of the works in the digital repositories listed above are either in the public domain or have been digitized as “orphan works”. They may lack currency, not reflect advances in science and society, and not represent a holistic collection of knowledge. Also, scholarly material, especially in the hard sciences and articles being prepared for publishing,may not be as likely to be freely available.

A second concern hinges on digital preservation and continual access. If institutions are simply collecting metadata about e-prints and other information resources, this means that there may only be one institution with an actual electronic copy. If that copy and its backups were lost, access to the full text – one of the main concerns of the OAI – would be undermined. Author David Weinberger advocates an item being in more than one place at the same time in Everything is Miscellaneous through the use of metadata; however, it may make more sense for an item to be in more than one place literally, through duplicate electronic copies. A related concern is that the status of currently free full-text materials owned by private and/or corporate institutions may change. As a result, access to such materials would disappear overnight. This is significant because most high-profile mass-digitization projects are being spearheaded by companies such as Google, rather than consortia of libraries, archives, and academic institutions. A possible solution to these problems is the LOCKSS (“Lots of Copies Keeps Stuff Safe”) project. Like the name implies, redundant (and decentralized) storing of information could allow for more permanent and reliable access to this information, something that the Open Archives Initiative inherently neglects by its metadata-centered design.

So far I have addressed predominantly digitized book collections in relation to OAI metadata harvesting. However, the original and continued focus of the OAI is on e-prints, or research and journal articles. As Peter Suber stated, “OAI-compliant archives are already here and already useful”. (Suber, May 2) So why then have they not challenged commercially offered databases? The answer is that scholars must not only be convinced of the value of OAI to include their works, but also that OAI repositories must be filled with quality materials that exploit the benefits of the OAI standards.

Depositing research in an OAI compliant repository is arguably not a priority or norm in the scholarly world. In OAIster, I searched for several of my instructors from library school. Of the five, only three had articles in the catalogue. For many authors, putting a paper on their personal website can be seen as the equivalent of depositing in a repository. Not only have they achieved open access, but using Google, or some future search engine with similar functionality, will allow users worldwide to find their work. The argument against this is that OAI’s use of fielded metadata allows for precision searching. (Suber, March 2) However, relevancy algorithms and properly executed XML effectively perform the same function as fielded searches. The question of whether search engines or OAI catalogues are superior is a highly contentious one, and there is little besides opinion in the scholarly literature to substantiate one side over the other. Peter Suber points to the following statements as examples of library and information science professionals’ unfounded preference for OAI:

“But OA-OAI archiving enhances visibility more than Google indexing does.” […] “Scholars doing serious scholarly research look in specialized scholarly tools and resources before they look in Google.” […] “Archiving will give an eprint a permanent or persistent URL.” […] “OAI-compliant searching tools refresh their indices faster than Google.” (Suber, March 2)

These assertions are flawed for many reasons. First, they ignore the use of e-prints by non-scholars. Second, they make broad, sweeping generalizations about Google, scholarly research methods, and OAI searching tools; these statements do not hold true in all cases. For every assertion of the OAI’s superiority, Suber – a proponent of the OAI – can find many convincing exceptions.

When a persuasive argument cannot be made for one tool over the other, why should scholars participate in an OAI compliant system? The answer is that a service like Google does not detract from the value of OAI, that the systems need not be mutually exclusive, and that OAI has the potential to offer a superior search experience should mass-collaboration be achieved. (Suber, March 2) Were mass-collaboration achieved, users would have access to substantial, cross-disciplinary databases of scholarly literature. Inappropriate materials would be excluded, decreasing the information overload so often experienced by users of search engines. This would also reduce the likelihood that users not versed in information literacy would utilize incorrect or flawed information resources. While David Weinberger suggests that information professionals “give up control” and “filter on the way out, not on the way in”, the OAI intends to do the exact opposite. (105, 102) In doing so, they hope to create bastions of knowledge known not only for their intellectual integrity, but supreme usability.

To enhance the value of OAI compliant repositories, Suber identified ten goals which information professionals should work toward. They include: working against the Ingelfinger Rule (a stipulation in contracts that prevents publishing of already available research), persuading all open access journals to participate, persuading publishers to supply metadata for their materials regardless of whether the documents are copyrighted or not in electronic formats, archiving “raw and semi-raw data, not just articles that interpret or analyze data”, and finally creating an open access citation index. (Suber, March 2) Other goals Suber listed have already been achieved in part, including making “postprint archiving […] a condition of research funding” (now true of government-funded research) and the depositing of theses and dissertations (true of many colleges/universities). (Suber, March 2) While these measures would predominantly ensure a steady flow of current material, I think that the most exciting goals are those involving publishers and the creation of a citation index. These two factors will make OAI compliant catalogues discovery tools par-excellence by creating a massive, deep (predominantly full text) index of serials and monographs, effectively removing the need for sloppy federated search engines which are inhibited by the proprietary structure of commercial databases.

The Open Archives Initiative relies on standards, which Roy Tennant has gone so far as to describe as the “engine of interoperability”. By utilizing such standards to organize information, the OAI is working to increase access to scholarly materials. Although the system has flaws and has not yet gained mainstream acceptance, it is clear that the potential exists for OAI-compliant technologies to radically transform the publication of academic research and, as a result, scholarly communication. Hopefully this transformation will mirror the value which served has guided the Initiative over time, open access to information.