Archive for April, 2007

Monday, April 30th, 2007

LibraryThing for Libraries launches

We’ve launched the LibraryThing for Libraries demo site. After CIL we pushed everything back a week to work on speed, add fielded imports, and make some interface changes to the tag browser.*

Here’s the demo site: http://www.librarything.com/forlibraries/

So far we have about two dozen libraries and consortia interested enough to send us ISBNs. Over the next few days we’ll be getting back to people with directions on testing the service out.

Sad to say, but we’re still trying to figure out pricing. Here’s my thinking, which ends in aporia.

  • It seems right to tie the price to the number of ISBNs that LibraryThing can potentially enhance. For public libraries, this is about 50-70% of ISBNs. For academic libraries it’s more like 25-50%. So, my thought was to make it $.02 for the first 25,000 ISBNs, and $.01 after that. (The two levels try to get at the shape of interest in a given ISBN; it’s more valuable to enhance Harry Potter than some obscure book.)
  • So, a small city in New England (pop. 75,000), has 84,612 ISBNs. 57,312 (67%) are enhanceable by LibraryThing. That comes to $823/year. That seems like a very good deal.
  • Clearly a consortium needs to pay more than a single library with the same number of ISBNs. After all, the consortium will have multiple copies of the item spread around the various consortium members. But a consortium of fifty libraries won’t actually have fifty copies over every ISBN, and there ought to be some “bulk” savings for them anyway.
  • This lead me to charging consortia a multiple of the square root of the number of members. So, for example, a library with 284,742 enhanceable ISBNs would pay $3,097, and an identical consortium with 28 members would pay $3,097 x SQRT(28) = $16,390.
  • Then you have the “branch” problem. A large city signed up for a beta test. They have 270,002 ISBNs—$2,950. But they have some 30 branches, a population of 600,000, and a library budget of $30 million dollars! This doesn’t work.

So, I think I need to get total collection or circulation figures, and multiply them by the percentage of ISBNs we can enhance.

I wish we could expand our pay what you want program…

(Money photo courtesy Jessica Shannon on Flickr, under Attribution-ShareAlike 2.0)

*Among other things, we normalized ISBNs, moving from storing a 13-character string in every table that needed them, to storing a four-byte integer tied to a table that mapped the integers to ISBN. Normalizing textual data happens all the time here, but normalizing something already so compacy and inherently unique was force on us by the dawning realization that we’re going to be handling dozens or hundreds of millions of bibliographic records. So now LTFL tosses around arbitrary ISBN keys like mad, without ever knowing what ISBN they represent. O brave new world…

Labels: Uncategorized

Thursday, April 26th, 2007

Bugs, New York, Radio

Today was a full day—New York, radio, publishers and bug-fixing. In reverse order:

Bug fixing. I finally slew the bug that sent work copies off into la-la land. I also found why book-swap data was screwed up. It turns out Bookmooch’s data feed is now too large for PHP’s 40MB default memory space, and this was short-circuiting other feeds. Wow—way to go Bookmooch. I increased it to 80MB until I can rewrite it to load the data in pieces, and reloaded everything. I also fixed a matching algorithm, so that http://www.librarything.com/title/the_perfect_store goes to The Perfect Store, not The Great Gatsby. I’ll be working the rest of the night, except when I have to put my laptop through the x-ray.

Publishers. I gave a talk to the Association of American Publishers. Twenty minutes is too little time, starting from zero and trying to get to what’s “happening” with social software and books. But I think I got across the central message—(1) I’m crazy*, (2) LibraryThing is orders larger and more interesting than its competitors, (3) stop marketing at people and get in the conversation, (4) get involved with LibraryThing.

It certainly would be nice if the publishing world were as friendly to LibraryThing as the library world.

New York. I flew into JFK this morning (6am departure, ouch!). I was there on business, but, since I work all day long, I don’t feel guilty spending the afternoon at The Strand, diligently confirming they do, in fact, have eighteen miles of books. Today’s haul: Richard Westfall‘s The Life of Isaac Newton and Adam Cohen‘s The Perfect Store: Inside eBay.

Radio. I appeared on Public Radio International‘s Radio Open Source. They were doing a show on David Weinberger’s upcoming book Everything is Miscellaneous: The Power of the New Digital Disorder. I’ve blogged about David and his book before. To repeat: It’s excellent. Weinberger, a true Miscellaneous Man**, explores how digitization and mass-collaboration, -filtering and -classification (eg., tagging) are changing knowledge, and its relation to authority. After an introduction with David, host Christopher Lydon brought in super-librarian Karen Schneider, then me, to chime in on the topic.

I pointed out how tagging worked for tags like chick lit, queer, glbt and lgbt. I also tried to get at a nagging issue for me—does “knowledge” change, or do we just get new perspectives and ways of getting at it? I’m happy to see the realm of debate, uncertainty, personal choice and personal understanding expand—for us to “swim in the complex,” as David writes. But I won’t give up on a small, hard (Pluto-like?) core of truth. More on that later.

OpenSource streams at 7pm tonight. After that, the audio—direct or podcast—will be available here.
*I love explaining to people that LibraryThing has no advertising or funded promotions, and doesn’t push affiliate links, but is profitable. On a more personal note, it was unreal being back among “publishing types.” I never mentioned it, but I used to work at Houghton Mifflin. I felt at home in uncomfortableness, as it were.
** Who else has a PhD in Philosophy and wrote jokes for Woody Allen? He’s more varied than my junk drawer!

Labels: Uncategorized

Tuesday, April 17th, 2007

5¢/patron, $1/student

For a while now, libraries have been approaching us about whether LibraryThing would sell them bulk memberships—so all their patrons could, potentially, become members. Today at CIL two more people asked. Time to act.

From now on if a public library or a college or university wants to buy memberships for everyone in a community, it’s 5¢/patron, $1/student.

The math is easy. If a town wants to give out free LibraryThing memberships, and they have 20,000 patrons—defined as working library cards—they would pay $1,000. If a college or university want it, they pay $1 for every student, grad and undergrad—profs. and staff ride for free. The library gets a stack of membership cards, each with a unique code, good for a year’s membership from the date of activation.

Details:

  • Patron cards would have to be given out in person, not over the phone.
  • Student accounts would require email confirmation to a valid school email (like Facebook)
  • Communities may elect to set up a group. Members would get an automatic invite for that group.
  • We will work to make sure LibraryThing links to and collects data from the institution in question. The latter requires an open Z39.50 connection.

If interested, write tim@librarything.com.

Labels: Uncategorized

Sunday, April 15th, 2007

Tim to CIL and the Library of Congress, Abby to Australia

UPDATE: If you’re in DC and want to come to CIL (Librarian? Enjoy vendor-tchotchkes?), I have 50 free tickets. I’m supposed to give them out to my important vendors and clients. That’s you. Email me and we’ll figure out how to do this. I’ll probably leave a stack at the closest Starbucks.

I’m off to Computers in Libraries in Washington, DC. LibraryThing will have a booth there, and I’ve giving two talks:

  • Tuesday, 1:30-2:30. “Cutting Edge Leaders.”* One whole hour of me, giving my general talk about what LibraryThing is and what it means, amped-up for savy CILers.
  • Wednesday. 11:30-12:15. “Catalogs/OPACs for the Future,” with me and Roy Tennant. I’ll probably do LibraryThing for Libraries.

I’ll be showing LibraryThing for Libraries at the talks. Unfortunately, it’s just me, so I’m going to torn between manning the booth and going to all the great talks. I’m bringing along forty CueCat barcode readers. Free? No. I’ll be giving them out at cost—$5.

On Thursday I’m doing a talk at the Library of Congress. I am completely psyched. It’s not open to the public, but they said I could sneak in a friend or two.

Also on Thursday, Abby will be in Australia at the Innovative Ideas Forum, hosted by the National Library of Australia.**

On Friday*** I’ll be the closing keynote at Digital Odyssey 2007, hosted by the Faculty of Information Studies, University of Toronto. I’m talking about “Social Cataloging and the ‘Fun’ OPAC?” I put them in myself, but I want to remove those quotes.

*Apparently I am one, because it’s just me and I not planning to talk about the others.
**Synchronicity. We have tried and failed to find national libraries for the other LibraryThing employees to talk at on the same day. If you represent such a library, please contact us.
***Portland->Boston->DC->Toronto->DC->Portland. Gulp.

Labels: Uncategorized

Saturday, April 14th, 2007

LibraryThing for Libraries: How it works / The five-second rule

The LibraryThing for Libraries widgets have a unique architecture. You install it on your OPAC’s HTML pages, but the OPAC doesn’t “do anything.” All the work takes place in browser JavaScript requests to the LibraryThing for Libraries servers. Only when the patron clicks on a specific book does the library OPAC come into the picture again.

Your creaky OPAC can rest easy. All the database work and the statistical number-crunching that makes something like recommendations or tag browsing possible takes place elsewhere. You get beefy new functionality without a single extra OPAC request. (Of course, we think using a LibraryThing-enhanced catalog will be so fun—we don’t mean that ironically—that patrons will spend more time browsing them.)

*BUT* before LibraryThing can take the work off your hands, it needs to know what ISBNs you have. So we ask for an export with ISBN data, and accept any format your OPAC makes.* And if a link to a book is to display the same title and author given in your OPAC, it needs to get them. Exporting and uploading them is impracticable. There are dozens of possible formats to parse, and anything that complicates the export process will limit our potential user-base. LibraryThing for Libraries needs to be dirt-simple. It needs to be people-who-doesn’t-even-know-HTML simple.

So, LibraryThing for Libraries hits your OPAC to collect titles and authors, “screen scraping” the pages. The question is: How fast can it go?

Good question, and one we’ve struggled with. In search-engine industry, the standard maximum is one request/second. Google, Yahoo, AskJeeves, MSN (who?) and their peers use that as their benchmark, although you can request to speed them up or slow them down using standards like robots.txt. And they’ll do it all day long every day, and obviously without regard for how many others are hitting you too. In March LibraryThing was visited by 71 registered “bots.” The greediest, Google, hit us 11,338,467 times–an average of 4 times/second–and took almost 200GB. As our total bandwidth was 650GB, you can understand why Google sometimes seems a a bit, er, codependent.

Anyway, I wrongly believed that most OPACs could handle 1/second. After all, the libraries who’ve contacted us all have systems that cost hundreds or millions of dollars. And most have unspiderable “sessions,” so LibraryThing wouldn’t be competing with Google and its ilk.

Apparently I was wrong. Until Thursday, the requests were sporadic or round-robin-ed, so the effective time between requests was more than a second. Thursday afternoon we threaded the process, so they could run mostly continuously and concurrently. This morning I heard back that LibraryThing was taking too much from one OPAC, and slowing performance. Yipes! The system in question served a consortium of more than 25 libraries, so one can expect it isn’t the slowest, worst OPAC out there! We yanked the spidering. They took it well, even so. We owe them.

So, the new rule will be one request/five seconds max. And I’ll put in the rule of monitoring how fast it took the document to come in, and waiting a multiple of that, so any performance issue is adjusted for in real time. The LibraryThing for Libraries interface–not yet publicly available–allows libraries to speed up or slow down the process. “Slow” will reduce it to 10 seconds; “fast” will increase it to 2 seconds.

The new speed will mean longer waits before a library can see LibraryThing for Libraries in action. In our experience, we run about 50% coverage on US publics, so a 250,000-ISBN library will have 125,000 overlapping ISBNs and take a week for us to fetch all titles and authors. With almost three million ISBNs in LibraryThing already, we can show a library what the widgets will look like before, so long as they understand the titles may not match theirs exactly.

We thank the dozen libraries who are participating in our initial tests of the system. We think everyone is going to be impressed with the result. We got the tag-browsing widget working last night, and it’s absolutely fantastic. Altay, our JavaScript guru, is outdoing himself. And I celebrated with a big hunk of brie. I can’t wait to finish it up and show it off at CIL and the Library of Congress next week.

*This is possible because ISBNs aren’t just numbers, but numbers with structure. They are either ten digits (and maybe an X) long or thirteen digits starting with 978 or 979.** And the last digit is a checksum–a calculation based on the others. So ISBN 0747532699 is the first British edition of Harry Potter and Philosopher’s Stone, now selling for upwards of $1,000. But change a digit and you don’t get another book, but an error. The checksum won’t work. If anything bad slips through, running the ISBNs against LibraryThing’s books tosses them out.
**ie., ([0-9]{9}[0-9X]|(978|979)[0-9]{10}) in regular-expression land, where I live.

Labels: Uncategorized

Friday, April 13th, 2007

Going to CIL, with inflatable

What do you do when Computers in Libraries charges LibraryThing thousands of dollars to exhibit, a cool $1,000 for two days of internet access and THEN tells us we can only bring in “what one person can carry in one trip”?

Well, what large thing can one person carry in one trip?

If you answered “a five-foot long inflatable rhino,” you think like we do. Here’s Abby showing us that it’s arrived. John and Altay (who?) and I are in the background.

Labels: Uncategorized

Thursday, April 12th, 2007

WorldCat: Think locally, act globally

OCLC just announced a “pilot” of WorldCat Local. In essence, WorldCat local is OCLC providing libraries with a OPAC.

That’s the news. Here’s the opinion. Talis’ estimable Richard Wallis writes:

“Yet another clear demonstration that the library world is changing. The traditional boundaries between the ILS/LMS, and library and non-library data services are blurring. Get your circulation from here; your user-interface from there; get your global data from over there; your acquisitions from somewhere else; and blend it with data feeds from here, there and everywhere is becoming more and more a possibility.”

I think this is exactly wrong. OCLC isn’t creating a web service. They’re not contributing to the great data-service conversation. They’re trying to convert a data licensing monopoly into a services monopoly. If the OCLC OPAC plays nice with, say, the Talis Platform, I’ll eat my hat. If it allows outside Z39.50 access I’ll eat two hats.

They will, as the press release states “break down silos.” They’ll make one big silo and set the rules for access. The pattern is already clear. MIT thought that its bibliographic records were its own, but OCLC shut them down when they tried to act on that. The fact is, libraries with their data in OCLC are subject to OCLC rules. And since OCLC’s business model requires centralizing and restricting access to bibliographic data, the situation will not improve.

As a product, OCLC local will probably surpass the OPACs offered by the traditional vendors. It will be cleaner and work better. It may well be cheaper and easier to manage. There are a lot of good things about this. And—lest my revised logo be misunderstood—there are no bad people here. On the contrary, OCLC is full of wonderful people—people who’ve dedicated their lives to some of the highest ideals we can aspire. But the institution is dependent on a model that, with all the possibilities for sharing available today, must work against these ideals.

Keeping their data hidden, restricted and off the “live” web has hurt libraries more than we can ever know. Fifteen years ago, libraries were where you found out about books. One would have expected that to continue on the web–that searching for a book would turn up libraries alongside bookstores, authors and publishers.

It hasn’t worked out that way. Libraries are all-but-invisible on the web. Search for the “Da Vinci Code” and you won’t get the Library of Congress–the greatest collection of books and book data ever assembled–not even if you click through a hundred pages. You do get WorldCat, seventeen pages in!

The causes are multiple, and discussed before. But a major factor is how libraries deal with book data, and that’s largely a function of OCLC’s business model. Somehow institutions dedicated to the idea that knowledge should be freely available to all have come to the conclusion that knowledge about knowledge—book data—should not, and traditional library mottos like Boston‘s “Free to All” and Philadelphia‘s Liber Libere Omnibus (“Free books for all!”) given way to:

“No part of any Data provided in any form by WorldCat may be used, disclosed, reproduced, transferred or transmitted in any form without the prior written consent of OCLC except as expressly permitted hereunder.”

We now return you to our regularly-scheduled blogging.

Labels: library of congress, oclc, open data, worldcat local

Wednesday, April 11th, 2007

LibraryThing for Libraries: Logo

Update: Redid, with green tag—more punch. Okay, that’s the last fiddle.

I hate design. It takes SOOOO long for me to sweat out something that doesn’t make my skin crawl. (What happens to your skin is another matter.) I went around on this one a hundred times, mostly over the font. The problem is the LibraryThing font, a low-rent freeware thing, Thomas Paine. Originally intended as a sort of Lovecraft-meets-Gorey joke, for a site I thought would get a few thousand visitors a year, the logo has to some extent transcended its origins and historical and aesthetic specificity. But do a whole new phrase in it—LibraryThing for Libraries? Man, it looks like some crappy site for an American history schoolbook, “Reƒolved in Congreƒs aƒƒembled that LibrarThing doth proclaim…”

Anyway, the current design tries to keep the logo and add the words without mashing another typeface against it—an approach that I could never get to work. And it’s supposed to suggest something added (ie., to your OPAC) or the concept of tagging.

After ten hours, I’m blogging to force finality. To force myself to stop fiddling. I don’t even want feedback. I just want to close Photoshop and be done with it.

Labels: Uncategorized

Tuesday, April 10th, 2007

LibraryThing for Libraries: XML format

The LibraryThing for Libraries “widgets” are designed to have three outputs, only one of which is really a widget:

  • As JavaScript, adding HTML to your OPAC
  • As a link to a web page, for for screen readers without JavaScript
  • As XML

The XML is aimed for that small percentage of libraries with a dedicated library-services programmer. XML would allow greater flexibility, for example mashing LibraryThing’s recommendations with patron borrowing patterns or holding status. But it would also require some serious scripting.

The XML we’re going to be delivering will be simplicity itself. Provide a ISBN and the “widgets” you want and get back items with ISBNs, titles, authors and your catalog URL. Here’s an example for “similar books” (ie., recommendations) and “related editons.” (http://www.librarything.com/demo_nypl.xml)

Labels: Uncategorized

Tuesday, April 10th, 2007

LibraryThing for Libraries: Pricing?

Wow! A big response to our offer–send us your ISBNs and we’ll show you what LibraryThing for Libraries can do. I just received ISBNs from two more libraries, both US publics, each with over 500,000 ISBNs. I guess we’ll get to stress test the database earlier, not later.

Best of all, both half-millioners had 51% overlap with LibraryThing (see UPDATE below). Considering that patrons do not look at random books, but focus more on popular ones, I’m guessing this means LibraryThing has data on something like 75% of the OPAC lookups performed in large US publics. The data available will vary, but it’s a good start.

I received a very thoughful email about pricing, from one of the top “library geeks.” This part deserves quoting:

“The problem with not charging a lot of money for this is that it doesn’t look like you’re serious. Anything that’s changing the functionality and the look of the catalog is going to be a big deal no matter what you charge (at least for medium-large public libraries — academics are less inclined to worry about their patrons getting the vapors). If it costs a lot, it’ll be treated like a project and taken seriously, and is more likely to happen. If it costs a little, it’ll just be treated as a hassle. Last piece of free advice would be to price based upon #checkouts/year. That’ll correspond pretty well towards the amount of web traffic that’ll get generated.Of course it’s hard to know what the price should be without seeing the product in action.”

Sad, but probably true. Maybe we can have it both ways. Charge $1,000,000/year to show we’re serious, but give everyone 99.9% discounts.

Tonight: Fire up Photoshop and try to make a logo that doesn’t suck.

UPDATE: And a third, this a mixed consortium with 590,000 ISBNs. Their overlap was 52%. This is turning out to be Planck’s constant!

Labels: Uncategorized