Wednesday, August 16th, 2006


Introducing thingLang, a simple, pragmatic API for determining the language of a book. thingLang uses LibraryThing’s MARC records, when it can. When it can’t it uses the Group Identifiers embedded at the start of the ISBN format. I’m releasing it for free, for both commercial and noncommercial use.* Aw, what the heck!

Examples: (Harry Potter in French) (The Hobbit in Lithuanian)

Rather than returning XML, which–ssh!–I don’t really like, LibraryThing returns a naked three-letter string, following the MARC standard. The exceptions are (1) if the ISBN is invalid, it returns “invalid,” (2) if it really can’t guess (see below) it returns “unknown.” It works equally well with ISBN10 and ISBN13. It’s not perfect, but it’s probably good enough.

UPDATE: If you add “&display=name” you’ll get the language’s name, instead of the code, eg. The Greek New Testament.

The weeds. Using ISBNs to determine language is a tricky problem. Strictly speaking, ISBNs don’t encode the language, but a queer mixture of language and region. The code 5, for example, is the “Evil Empire” code. Okay, they don’t call it that, but that’s what it is–Azerbijan, Tajikistan, Armenia, Estonia, Georgia, etc. You know the language has seen a translation of Lenin’s works, but that’s about it. The same problem affects India (dozens of languages, with most of the LibraryThing books being English), Sri Lanka, and others.

Or take Egypt. Although most ISBNs published in Egypt are in Arabic, most Egyptian books logged on LibraryThing are merely from Egyptian publishers. In fact, they’re mostly tourist guidebooks.

Or take Ethiopia. Of about 50 books Ethiopian ISBNs, none are in Amharic or about Ethiopia. When Avon published its Science Fiction Hall of Fame (1970), was it just poaching numbers? (ISBNs, after all, cost money.)

So, Abby and I went through the numbers, running them against LibraryThing’s holdings. If most of the books for a given code were in a single language, we use it. If not, we igore it.

The result is an API that works pretty well for LibraryThing and, we suspect, for many other sites. (We made this, in part, because BookMooch was looking for a solution, and we felt generous.)

*Terms. Don’t hit it more than once/second. If you use it more than experimentally, you must put a notice somewhere on your website reasonably near what it’s contributing, linking to LibraryThing.

Labels: Uncategorized


Comments are closed.