Archive for August, 2006

Friday, August 25th, 2006

LibraryThing on whether Pluto is a planet

UPDATE: The backlash begins!

As many of you know, the International Astronomers Union recently voted to demote Pluto from its former planetary status. Librarians, or at least the Dewey Blog, have been following the debate for some time now. For Dewey the stakes are high. Books about Pluto are classed at 523.482, within “Trans-Uranian Planets,”* which is in “Planets of the Solar System” schedule. Is it time to reshelve?

Beyond Dewey, the Pluto vote raises many of the same issues as library classification in general. Both involve authority and are understood as binary. In general, the librarians have better intellectual grounds. A book must really reside on one shelf**, and who better to decide this than the people who use it everyday?

Meanwhile, the Pluto vote won’t affect any astronomers actual work, nor, say, the “findability” of Pluto for the rest of us. The vote is a classic “pseudo-event.” I for one don’t see why the IUA’s opinion—rather, the opinion of the 400-odd (of of 2,700) conference attendees who still remained on the last day—on the matter should be definitive.*** What do the astrologers, historians of science, linguists and poets have to say?

Or, for that matter, how about LibraryThing members? Funny you should ask!

Related tags: planets Related tags: Pluto

So you see, Pluto is “kind of” a planet. It’s not planetary enough to be included on the related tags for planet. But the related tags for Pluto include “planet.”

So, it’s “sort of” a planet. Or maybe it’s a planet, but not a very good example of one. That’s a perfect LibraryThing answer. Non-binary, non-authoritative. Pretty good answer though.

*Another item from the classification schedule revealed!
**And, under a physical card catalog, it must have a discrete number of subject cards.
***The Dewey blog takes it for granted that OCLC’s classification should be affected by the vote. The NYT reports that school publishers were holding up textbooks. Having been directly involved in the production of school textbooks, I say bullshit.

Labels: Uncategorized

Wednesday, August 16th, 2006


Introducing thingLang, a simple, pragmatic API for determining the language of a book. thingLang uses LibraryThing’s MARC records, when it can. When it can’t it uses the Group Identifiers embedded at the start of the ISBN format. I’m releasing it for free, for both commercial and noncommercial use.* Aw, what the heck!

Examples: (Harry Potter in French) (The Hobbit in Lithuanian)

Rather than returning XML, which–ssh!–I don’t really like, LibraryThing returns a naked three-letter string, following the MARC standard. The exceptions are (1) if the ISBN is invalid, it returns “invalid,” (2) if it really can’t guess (see below) it returns “unknown.” It works equally well with ISBN10 and ISBN13. It’s not perfect, but it’s probably good enough.

UPDATE: If you add “&display=name” you’ll get the language’s name, instead of the code, eg. The Greek New Testament.

The weeds. Using ISBNs to determine language is a tricky problem. Strictly speaking, ISBNs don’t encode the language, but a queer mixture of language and region. The code 5, for example, is the “Evil Empire” code. Okay, they don’t call it that, but that’s what it is–Azerbijan, Tajikistan, Armenia, Estonia, Georgia, etc. You know the language has seen a translation of Lenin’s works, but that’s about it. The same problem affects India (dozens of languages, with most of the LibraryThing books being English), Sri Lanka, and others.

Or take Egypt. Although most ISBNs published in Egypt are in Arabic, most Egyptian books logged on LibraryThing are merely from Egyptian publishers. In fact, they’re mostly tourist guidebooks.

Or take Ethiopia. Of about 50 books Ethiopian ISBNs, none are in Amharic or about Ethiopia. When Avon published its Science Fiction Hall of Fame (1970), was it just poaching numbers? (ISBNs, after all, cost money.)

So, Abby and I went through the numbers, running them against LibraryThing’s holdings. If most of the books for a given code were in a single language, we use it. If not, we igore it.

The result is an API that works pretty well for LibraryThing and, we suspect, for many other sites. (We made this, in part, because BookMooch was looking for a solution, and we felt generous.)

*Terms. Don’t hit it more than once/second. If you use it more than experimentally, you must put a notice somewhere on your website reasonably near what it’s contributing, linking to LibraryThing.

Labels: Uncategorized

Sunday, August 13th, 2006

Angry about classification

This mural is said to depict Dewey and the railroad service he gave to Lake Placid, FL. Is it just me, or does it look like the train is about to hit him?

So, I’m working on an extension to LibraryThing that requires getting my hands on one or more full classification schemes, such as Dewey or LC Classification. In theory, I could use LCSH too and I have an underdog fondness for Cutter.*

I want to do something new, interesting and experimental, moving traditional classification in a new direction.

Ha! Shouldn’t have even tried. I can’t get any of them in anything more than a “survey” or “outline.” Full printed versions cost huge amounts of money. Digital versions are even more expensive. Use of either involve restrictive terms. It’s infuriating.

Preventing open access to Dewey is, of course, in the interest of its owner, OCLC. (We’ll leave aside the issue of OCLC’s non-profit status.) But why do I need to pay for access to the LC’s data? Libraries exist to give information away, and the federal government exists because I consent to and pay for it. So, how does this lead to me paying $575 for a 1-4 user site license of LC’s primitive Classification Web? I can’t see any way to get that to work with LibraryThing, and my proposed use would also violate their terms of service anyway. These require all users to share the same physical location. Fortuantely they no longer need to be related or share the same barber.***

So much for creative use of library data. What’s the use of talking about APIs and mashups when the lowest level of all library data is unfree?

The solution. Here’s my thinking. I can use Cutter, or resort to a version of Dewey published long enough ago to be out of copyright. (Dewey’s original 1876 publication is available online at PG.) But there’s a wrinkle. Although OCLC can’t claim copyright over versions of Dewey before 1923, they have perpetual trademark rights.**** So I’ll have to call them Melvils, after Dewey’s first name. Let’s hope nobody needs to catalog anything about computers or, say, the phonograph. “Saddlery and shoe-making”? No problem.

We all understand why authors’ need legal protections for their books. Can someone explain to me why cataloging systems need them?

I’ve previously dismissed the question of which is better, tags or traditional classification? Both have uses, it’s true. They do different things. Neither is going to go away. But one is free and can, in this crazy, tubed-up age, be offered to people all over the world.

Pick the winner, kids.

*I’d love to use Cutter, mostly in support of my beloved Boston Athenaeum, which still uses it (along with four other libraries). I like underdogs. But Cutter’s inclusion of book size within the call number is singularly unsuited to the digital shelf. “There is no shelf, and there sure as hell is no oversized shelf.” The core system is, however, perfectly good. And since only five libraries use it, no one has a profit motive in it.
Cutter himself seems to have been a pioneer of openness; this is from the Forbes Library biography of him:

“Cutter’s vision for the Forbes, in his own words, was for “a new type of public library which, speaking broadly, will lend everything to anybody in any desired quantity for any desired time.” There were to be no bothersome rules and children would be welcome. [I]n another of Cutter’s major departures from the standard practice in most libraries of the time, the Forbes’ patrons were free to browse the open stacks rather than having to request books at the front desk, which a staff member would then fetch.”

(Dewey, meanwhile, was a racist and antisemite.) Actually, Cutter is starting to look good to me. Does anyone know the best, most recent unrolling of the system—something that tells you where to put books about, say, wifi communication?
**I’d love to do my own library in the Blegen system, used apparently in only one library, the American School of Classical Studies at Athens. This would work wonders for my Teubners, I’m sure. But my O’Reilly’s would be a problem.
***In the course of cataloging, LibraryThing has amassed a rather large set of LCSHs. But it’s nowhere near the full list, which similarly must be paid for.
****No doubt some of you are aware of the infamous Library Hotel case, when they sued a hotel for organizing its floors by the DDCS (room 800.001, “Erotic Literature” is particularly coveted).

Labels: Uncategorized

Tuesday, August 8th, 2006

Introducing the thingTitle API

As noticed below, the Mashing up the Library competition is drawing to a close. LibraryThing tried to stoke things by releasing its thingISBN API, which takes an ISBN and returns ISBNs for all the other editions. Here’s one more API. Go do something interesting with it, eh?

thingTitle. Announcing another simple LibraryThing API. Feed it a title and it will return all the ISBNs from the most likely LibraryThing “work,” the LibraryThing title and a link to the LibraryThing work page.

It should prove useful for people who have uncontrolled, ratty or ISBN-less data. I plan to use it to mine the syllabi and playlists at H20. A company like IMDb could use it to provide ISBN links for movies “based on” a novel, or dating services for their members’ favorite books—it works well with The Unbearable Lightness of Being. (Note, however, the no-commerial-use license.)

Examples: Hobbit Strange Curious Incident

It’s flexible with how you do spaces: Hobbit

It’s not perfect. It can’t read your mind and it learns toward popular things. If you give it “The curious incident” it will guess you mean Mark Haddon’s The Curious Incident of the Dog in the Night-time, owned by over 2,000 LibraryThing members, not The Curious Incident Of The WMD In Iraq (4 members), or the parody The Curious Incident of the Dog in the Nightdress (1 member).

The output is simplicity itself:

<title>The magician's assistant</title>
<license>By using this service you agree to its license.</license>

Note that it doesn’t always return a title. If LibraryThing only knows the title from Amazon data, it omitts it per the Amazon TOS.

Improvements. At present, it only returns one result. It could return many, in descending order of likelihood. Nor does it accept any other hints, such as the author name. It could accept these, and take them into account.

License. As with thingISBN, thingTitle is available for non-commercial only. (Commercial use requires our written permission.) You can only make 1 request/second, and if you plan to hit it more than 1,000 times/day for an extended period, you must notify us of what you’re doing. In fact, we’d love to hear anyway. Needless to say, it’s provided “as is” with no guarantee whatsoever. If you use thingTitle to run a lathe, we are not responsible for missing digits.

Labels: Uncategorized

Tuesday, August 8th, 2006

Mashup Competition Reminder

Just a quick reminder, Talis’ Mashing Up the Library competition ends on August 18! I’m one of the judges, and I want some interesting stuff to talk about!

Labels: Uncategorized