Open Access Week: Principles for Open Bibliographic Data

It's Open Access Week this week and as part of the celebrations I thought I highlight a recent declaration by the Open Bibliographic Working Group on the Principles for Open Bibliographic Data. It's an incredible idea, one that I support completely -- the aim is to make bibliographic data open, reusable and remixable. Creating a bibliographic data commons would lead to many opportunities to create search and discovery tools that would be of great benefit to scholarship, education, research and development.

I won't try and explain the details of the declaration since it's released under a CC-BY license and I can therefore just reproduce it right here for all to see. I'll do that below.

Before I do that, I'll respond to the group's call for feedback. One thing that struck me right away is that it would be great if a project like that could make a mention of the utility of authors making subject bibliographies open as well. I'm thinking here of the kinds of things that you would normally see at the backs of books or review articles.

David Weinberger is a good example. He's looking for a bibliography commons to hold the reading list for his current book project, Too Big to Know.

Ideally, I'd like a site that is an open commons, maintained by an institution that has some legs. It should present my biblio in standard readable and re-citable forms, but should also treat it as data in a database so that it can be refactored. I'd love for it to have LibraryThing's social functionality. And in a perfect world, it'd let me enter just some key data, look it up, and fill in the rest in perfectly formatted form. (Again, LibraryThing does cool stuff in this area, for books.)

A recent example of a book I read with a very interesting and useful bibliography is the novel Swastika by Michael Slade. It had a list of some great resources on Nazi secret weapon programs, a topic I'm interested in professional because of my role as history of science collections development librarian for my library. York also has a course on Science, Technology and Modern Warfare.

Virtually any history book is going to be a great example of this and, of course, many other kinds of books as well, both academic and non-. Whenever I read a book I pay close attention to the bibliography and often use it as a collection development tool. A couple of relatively recent examples of this are the history of spaceflight Countdown and the Isaacson Einstein biography. In both cases, I used the bibliographies to improve our collections in these areas. The Swastika one I'll be looking at pretty soon as well.

As such, I think there would be a lot of value in more explicitly encouraging authors (and others with such subject-based bibliographies) to make their bibliographies for such books openly available; lots of time and effort goes into their creation, distinct in many ways from the value in the book itself. I can imagine people using sites such as LibraryThing, GoodReads and especially Zotero or Mendeley as homes for such things.ots of work go into these and it would be nice if a corner of a project like this could be made available.


Principles for Open Bibliographic Data

For some time now the OKFN Working Group on Open Bibliographic Data has been working on Principles on Open Bibliographic Data. While first attempts were mainly directed towards libraries and other public institutions we decided to broaden the principle's scope by amalgamating it with Peter Murray-Rust's draft publisher guidelines. The results can be seen below. We ask anyone to review these principles, discuss the text and suggest improvements.

Principles on Open Bibliographic Data

Producers of bibliographic data such as libraries, publishers, or social reference management communities have an important role in supporting the advance of humanity's knowledge. For society to reap the full benefits from bibliographic endeavours, it is imperative that bibliographic data be made openly available for free use and re-use by anyone for any purpose.

Bibliographic Data

In its narrowest sense the term 'bibliographic data' refers to data describing bibliographic resources (articles, monographs, electronic texts etc.) to fulfill two goals:

  • Identifying the described resource, i.e. pointing to a unique resource in the set of all bibliographic resources.
  • Addressing the described resource, i.e. indicating how/where to find the described resource.

Traditionally one description served both purposes at once by delivering information about:

  • author(s) (possibly including addresses and other contact details) and editor(s),
  • title,
  • publisher,
  • publication year, month and place,
  • title and identification of enclosing work (e.g. a journal),
  • page information,
  • format of work.

In the web environment the address can be a URL and the identification a URI (URN, DOI etc.). Identifiers thus fall under this narrow concept of 'bibliographic data'.

Furthermore there is several other information about a bibliographic resource which in this document falls under the concept of bibliographic data. This data might be produced by libraries as well as publishers or online communities of book lovers and social reference management systems:

  • Identifiers (ISBN, LCCN, OCLC number etc.)
  • rights associated with work
  • sponsorship (e.g. funding)
  • tags,
  • exemplar data (number of holdings, call number)
  • metametadata (administrative metadata (last modified etc.) probably often created automatically).
  • relevant links to wikipedia, google books, amazon etc.
  • cover images (self-scanned or from amazon)
  • table of content
  • links to digitizations of tables of content, registers, bibliographies etc.

Libraries as well produce authority files like:

  • name authority files,
  • subject authority files,
  • classifications.

We assert that the information associated with an individual work is in the public domain. It follows that an individual bibliographic entry derived from the work itself is free of restrictive rights as are authority records. This holds true as well for individual authority records. There might only be rights on aggregations of bibliographic and authority data.

Formally, we recommend adopting and acting on the following principles:

  1. Where bibliographic data or collections of bibliographic data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual bibliographic entries/elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license. When publishing data make an explicit and robust license statement.

  2. Many widely recognized licenses are not intended for, and are not appropriate for, metadata or collections of metadata. A variety of waivers and licenses that are designed for and appropriate for the treatment of are described here. Creative Commons licenses (apart from CC0), GFDL, GPL, BSD, etc. are NOT appropriate for data and their use is STRONGLY discouraged. Use a recognized waiver or license that is appropriate for metadata.</strong>
  3. The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation. If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition - in particular non-commercial and other restrictive clauses should not be used.
  4. Furthermore, it is STRONGLY recommended that bibliographic data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This is in keeping with the public funding of most library institutions and the general ethos of sharing and re-use within the library community. We strongly recommend explicitly placing bibliographic data in the Public Domain via PDDL or CC0.

  5. While we appreciate that certain types of bibliographic metadata do require some extra work in their creation we strongly assert that making these open has major benefits not only to the community as a whole but also to the creator (author, publisher, library, etc.). Benefits include enhanced discoverability widening the potential usage of a work and "save-the-time-of-the-reader". These types include:
    • abstracts (whether generated by author, publisher, library or machine)
    • keywords, subject headings and classification notations (whether generated by author, publisher, library or machine)
    • reviews (either human or machine-generated)

    As a fifth principle we strongly urge that creators of bibliographic metadata explicitly either dedicate this to the public domain or use an open license.

  • Mr. Gunn says:

    It's worth noting that Mendeley's research catalog is licensed CC-BY and we'd love it if authors, as opposed to just researchers, established a profile and uploaded their bibliographies.

  • John Dupuis says:

    Thanks for the note, Mr. Gunn, I'm glad to know that products like Mendeley make it easier to be open rather than more difficult.

