Sunday, February 1st, 2009

The evil 3.26%

The question has arisen of why I advocate against OCLC’s attempt to monopolize library data. Roy Tennant of OCLC, an intelligent, likeable man whom, although we disagree on some issues, has done more for libraries than most, accused me of writing and talking about the issue because:

“… your entire business model is built on the fact that you can use catalog records for free that others created and not contribute anything back unless they pay (yes, there is a limited set of data available via an API, but then they need the chops to do something with it).”

Fair enough. Let’s look at the numbers, and the argument.

I did a comprehensive analysis, available here as a text file, with both output and PHP code. If anyone doubts it, send me an email and I’ll let run the SQL queries yourself.

The numbers. As of 6:17pm Sunday, some 3.5 years after LibraryThing began, our members have added 35,831,904 books from 690 sources:

  • 85.48% came from bookstore data (almost exclusively Amazon).
  • 4.88% were entered manually by members
  • 9.63% were drawn from library sources

Now, where did that 9.63% come from?

These sources were in every case free and open Z39.50 connections our members accessed through us. Very frequently they accessed records of their own academic institution, but in any case, these members accessed these records alongside everyone else—libraries, museums, public agencies of one sort or another and all the students and scholars who use RefWorks, EndNote and other such services. Meanwhile LibraryThing has never been asked to stop accessing a source. On the contrary, libraries frequently ask to include themselves on our list of sources.

Of the 9.63%, by far the largest source is the US Library of Congress, the source of 2,203,182 books, or 6.15% of the total. The Library of Congress is a Federal organization, created for the benefit of the country and falling under the government-wide rule that public work is for the benefit of the public, and cannot be copyrighted or otherwise “owned.” As long as technology was there the Library of Congress has allowed access to its cataloging data; the OCLC policy change will not affect that.* We are grateful the Library of Congress does this. But insofar as we are taxpayers and support American notion of public ownership of public resources, I will not apologize for it. (On the contrary, I feel that OCLC should apologize for attempting to restrict and profit from public work.)

3.26%. That leaves 3.48%—more appropriately 3.26%**—the evil sliver upon which our “entire business model is built.” Take a look at the top fifteen here:

  • Koninklijke Bibliotheek — 130,406 books (0.36%)
  • National Library of Scotland — 80,826 books (0.23%)
  • British Library (powered by Talis) — 80,205 books (0.22%)
  • Gemeinsamer Bibliotheksverbund (GBV) — 77190 books (0.21%)
  • National Library of Australia — 72,896 books (0.2%)
  • Helsinki Metropolitan Libraries : 70,551 books (0.2%)
  • The Royal Library of Sweden (LIBRIS) : 63,430 books (0.18%)
  • Italian National Library Service : 60,643 books (0.17%)
  • Vlaamse Centrale Catalogus : 58,936 books (0.16%)
  • LIBRIS, svenska forskningsbibliotek — 54,339 books (0.15%)
  • ILCSO (Illinois Libraries) : 28,517 books (0.08%)
  • Yale University : 26,885 books (0.08%)
  • Det kongelige Bibliotek : 24,564 books (0.07%)
  • University of California : 20,098 books (0.06%)
  • : 19,628 books (0.05%)

With 690 possible sources, it’s a long, long tail. We take 2087 from the Russian State Library, 1067 records from the Magyar Országos Közös Katalógus, 286 from Princeton, 106 from Koç (in Izmir), 63 from Hong Kong Baptist, 4 from the Universidad Pública de Navarra, etc.

It should be apparent to anyone looking at the above that the 3.26% is largely about satisfying the needs of foreign LibraryThing members–a small percentage of our membership and hardly central to our “business model.” Equally clear is the government orientation of the list—only one, Yale—is a private institution. The rest are all government agencies. Of course, no records actually came from OCLC itself!

All-in-all, library data from non-federal sources is a negligible component of LibraryThing’s content. LibraryThing is not some big plot to capture library records. That idea is simply not in the figures.

Do we give back? What of the second half of the accusation, that we “not contribute anything back unless they pay” and the bit against APIs.

First, assuming Roy means LibraryThing data generally, it’s absurd to suggest that because LibraryThing draws 3.26% of its data from free, unlicensed sources, our members’ data and services are owned by OCLC or its members. OCLC no more owns members’ tags and reviews on bibliographic metadata than Saudi Aramco owns the furniture I bring home in my car. Who in their right mind would every accept a list of titles and authors from a library, if that meant ceding ownership over what you think about the book?

LibraryThing and OCLC both have terms. But LibraryThing license terms are unlike OCLC’s in a number of ways. LibraryThing members knew what they’re getting, unlike OCLC members, who thought they were sharing with other libraries, but find themselves the lynchpin of a monopoly. From our inception LibraryThing has reserved a right to sell aggregate or anonymized data. We also sell some reviews—giving members the option to deny them to us. All our member data is non-exclusively licensed, so members can do anything they want with it outside of LibraryThing, and members can leave at any time. Neither is true of OCLC members’ data under the Policy.

Cataloging data. That leaves LibraryThing cataloging data, of which we have three types. We don’t have any legal responsibility to make it free, but we do so anyway.

First, we would be happy to offer downloads of original or modified MARC records! We haven’t done so in order to avoid attracting a suit from OCLC. But perhaps we were mistaken. If OCLC would like us to start releasing our MARC records to others, someone should let us know. We will release them under the same terms they were given to us—freely.

Second, our Common Knowledge cataloging (series, awards, characters, etc.) is free and available to all. We can’t think of a better way to provide it other than through an API, but we’re all ears if Roy knows of a better way. And if OCLC would like to admit it to WorldCat, without subverting its always-free license, they don’t even need our permission. Go on, OCLC, make my day!

Thirdly, there’s ThingISBN, which was directly patterned on OCLC’s xISBN service. Despite Roy’s criticism, they are identical in format and delivery so if there’s something wrong with its XML APIs, OCLC has only itself to blame. Indeed the only difference is cost: ThingISBN is completely free, both as an API and as a feed; xISBN, which member data creates, is sold back to members.

Stop killing the messenger. It’s time for OCLC to recognize they made this mess, not others. They have perpetrated some astouding missteps—from attempting to sneak through a major rewrite of the core member policy in a few days without consultation, to a comic series of rewrites and policy reversals, culminating in withdrawing the policy entirely for discussion. (It now seems clear they did so on the heels of a member revolt, whether general or just of some key libraries.)

It’s also important to see that, before OCLC started threatening companies and non-profits doing interesting but non-competing things with book data—notably LibLime, Open Library and LibraryThing—they had none of the problems they have now. Now, by attempting to control all book data, they’ve spurred the creation of LibLime’s ‡Biblios system, a free, free-data alternative to OCLC and, well, sent me, Aaron Swartz of Open Library and dozens of prominent library bloggers into orbit.

Being caught so flat-footed can’t feel nice. It must be hard feeling like royalty and discovering your subjects think themselves a confederacy. But this is no time for OCLC to start attacking the credibility of its opponents. Surely LibraryThing is an unusual case—a company that has an opinionated, crusading—okay, loud—president. But the thousands of librarians and other individuals who supported our calls, or raised other objections to the OCLC policy are not less well-motivated than OCLC and its employees. They do not love libraries less. They are, rather, concerned that OCLC’s urge to control library metadata threatens longstanding library traditions of sharing, and sets libraries on a path of narrowness and restriction that will surely prove no benefit in this increasingly open, connected world.

*I need to write a blog post on this, but I was recently informed that whatever changes OCLC makes cannot touch federal libraries without explicit authorization. That is, federal law does recognize clauses like “if you continue to use” or “we can change this at any time.”
** It should more accurately be 3.48%, because we are getting our British Library records through Talis, who have a contract with the British Library.

Labels: oclc

One Comments:

  1. […] that those who are interested in the debate know of Karen’s entry and my comments on it. Tim Spalding might be feeling the same […]