Archive for the ‘library of congress’ Category

Monday, December 22nd, 2008

LCSH.info, RIP

LCSH.info, Ed Summers’ presentation of Library of Congress Subject Headings data as Linked Data, has ended. As Ed explained:

“On December 18th I was asked to shut off lcsh.info by the Library of Congress. As an LC employee I really did not have much choice other than to comply.”

I am not as up on or enthusiastic about Ed’s Semantic-Web intentions, but the open-data implications are clear: the Library of Congress just took down public data. I didn’t think things could get much worse after the recent OCLC moves, but this is worse. The Library of Congress is the good guy.

Jenn Riley put it well:

“I know our library universe is complex. The real world gets in the way of our ideals. … But at some point talk is just talk and action is something else entirely. So where are we with library data? All talk? Or will we take action too? If our leadership seems to be headed in the wrong direction, who is it that will emerge in their place? Does the momentum need to shift, and if so, how will we make this happen? Is this the opportunity for a grass-roots effort? I’m not sure the ones I see out there are really poised to have the effect they really need to have. So what next?”

The time has come to get serious. The library world is headed in the wrong direction. It’s wrong for patrons—and taxpayers. And it’s wrong for libraries.

By the way, Ed, we’re recruiting library programmers. The job description includes wanting to change the world.

See also: Panlibus.

Labels: library of congress, open data

Wednesday, December 10th, 2008

The New OCLC Policy and Federal Libraries

This blog post attempts to show that the new OCLC Policy (blogged here) effectively anulls a longstanding principle of US law, that work performed by government officials and employees is forever in the public domain.

In a library context, this has always meant that Federal libraries are not only free but compelled to share their information with the public that pays for it.

Many continue to hold that this is still true. As one AUTOCAT poster wrote:

“I find it hard to believe OCLC would attempt to assert an intellectual property right over things such as LC cataloging, which by statute is in the public domain.”

Unfortunately, this conception confuses two areas of law. By crafting the Policy as a license, which is perpetual, retroactive and viral, OCLC can effect a sort of ownership–US citizens still own it, but the don’t have a right to get it (except, if the qualify, with an OCLC license around it).

Thus, OCLC transforms an expensive service–access to a repository of data that, even OCLC employees admit, would fit on an iPod, with room for 5,000 songs!–into effective ownership. This state of affairs obtains even when all the cataloging and editing was done by other Federal agencies and employees. It is only broken when the library in question itself did the original cataloging. As we shall see, that doesn’t help much.

Three Federal Libraries. The OCLC affiliate for Federal libraries, FEDLINK, maintains a list of its members–libraries like the Library of Congress, NASA, Justice, the Smithsonian, the National Library of Medicine, the Supreme Court, etc.

From this list I plucked three that have public catalogs–the Department of Defense, Commerce, and Labor–and carefully examined the first ten MARC records for three common English words. I checked these against the 001, 035 and 994 fields recommended in the Policy FAQ, “How can I determine if a record was derived from WorldCat?”* The results are depressing.

Of the Department of Defense‘s ten books on “Freedom,” zero will be free after the Policy takes effect. None were originally cataloged by the Department of Defense, and all had 035 fields showing they were at one point “derived” from OCLC. In every case, the original cataloger was the Library of Congress, and many were edited by the Department of Defense. But that doesn’t count. They aren’t DoD original cataloging and they bear the mark of OCLC. As far as the Policy is concerned, that’s the end of the story.

Of the Department of Labor’s ten “Copyright” books, zero again are free. All ten were cataloged and edited by Federal employees (mostly the LC and the Congressional Information Service). But none were cataloged by the Department of Labor, and all have fatal 035 fields.

The situation at the Department of Commerce was slightly better. Here I searched for “Openness” and got only eight results. Five are clear-cut OCLC records. Two might be free–they lack 001 and 035 fields, although OCLC appears in the 040. I think, however, that they aren’t currently held by the library though, and, in an overlooked provision, the OCLC Policy prohibits transfer of records when a library doesn’t hold the book. But one is free–cataloged by the University of Alabama and lacking any trace of OCLC transfer.

Don’t think the OCLC Policy affects Federal libraries? Think again.

Sign the Petition (if a librarian, also see this one).


Data. Here's what I found. Prove me wrong.

Department of Defense: first ten records with title starting "Freedom."

  • Freedom by Orlando Paterson (035 has ocm; cataloged by LC, edited by Department of Defense)
  • Freedom by William Safire (035 has ocm; cataloged by LC, edited by Department of Defense)
  • The Destruction of slavery (035 has ocm; cataloged by LC, edited by Department of Defense)
  • Freedom : a history (035 has ocm; cataloged by LC, edited by Department of Defense)
  • Freedom and foreign policy (035 has ocm; cataloged by LC; OCLC edits)
  • Freedom and information (035 has ocm; cataloged by LC, edits by Baker & Taylor, Connecticut State Libray and Department of Defense)
  • Freedom and the Law (035 has ocm; cataloged by LC, edited by Department of Defense)
  • Freedom at Issue (035 has ocm; cataloged by LC and about a dozen other instittions, not including OCLC)
  • Freedom at Midnight (035 has ocm; cataloged by LC, edited by Brown, OCLC and Department of Defense)
  • Freedom betrayed (035 has ocm; cataloged by LC, edited by Department of Defense)


Department of Labor: first ten records with title starting "Copyright."

  • Intellectual property and trade (035 has ocm; cataloged by US International Commission, editded by Government Printing Office and the Congressional Information Service)
  • Berne Convention Implementation Act of 1987 (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Berne Convention Implementation Act of 1988 (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Record rental amendment extension (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Satellite Home Viewer Copyright Act of 1988 (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Berne Convention (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • General oversight on patent and trademark issues (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Copyright issues presented by digital audio tape (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Legal issues that arise when color is added to films originally produced, sold, and distributed in black and white(035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • The Berne Convention (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)


United States Department of Commerce: first eight records starting "Openness" (only 8 records total)

  • Globaphobia: confronting fears about open trade (001 incldes ocm; cataloged by LC and Colgate)
  • Regulatory reform and international market openness (035 includes ocm; cataloged by Stony Brook)
  • Financial policies and the world capital market : the problem of Latin American countries (001 contains ocm; cataloged by DLC)
  • +A vision for the world economy : openness, diversity, and cohesion (040 includes OCL; cataloged by LC, with edits by National Agricultural Library)
  • Regulatory reform in the global economy (035 includes OCM; Cataloged by University of Georgia)
  • +Globalization and progressive economic policy (040 includes OCL; cataloged by Library of Congress, edited by British Library)
  • Regulatory reform in Spain (cataloged by University of Alabama)
  • Challenges to globalization (001 contains ocn; cataloged by University of Texas)

*The FAQs are not, however, determinative of anything. The Policy makes this clear:

“This Policy is the final, complete and exclusive statement of the agreement of the partiwith respect to the subject matter hereof.”

Similarly problemmatic is the claim that OCLC will not be asking libraries to shut down Z39.50 connections. The Policy makes it clear that libraries cannot “Transfer” records to companies or for “Unreasonable use” (ie., building up a free database of library records). Since companies and entities like the Open Library aren’t going to agree to the Policy, how exactly can a library avoid violating their contractual agreement if they don’t shut down Z39.50 connections?

Labels: copyright, department of commerce, department of defense, department of labor, federal libraries, freedom, library of congress, oclc, openness

Thursday, February 21st, 2008

Taxation without web presentation

The Library of Congress recently signed a deal to accept 3 million dollars worth of “technology, services and funding” from Microsoft towards building a new website powered by Microsoft’s Silverlight plug-in. I (Casey) usually leave the blogging to Tim, but I’ve got to say something about this.

Microsoft, in general, is very good to libraries, and libraries are very good to them. Microsoft gets huge tax breaks for donating software licenses — something that doesn’t really cost them a thing — and libraries get software they couldn’t afford otherwise.

This is a different beast, however. It sounds like Microsoft technologies will be used from the ground-up — if you use Microsoft’s Silverlight to do the front-end, your developers pretty much have to use Visual Studio and Microsoft languages, your database admins have to use MS SQL Server, and your systems admins have to use Windows and IIS. In any case, it seems unlikely that Microsoft would consult on a project and not recommend you use Microsoft as much as possible.

Once you’re locked in to the entire Microsoft stack, you pretty much can’t change a single piece without completely redoing your entire IT operation from top-to-bottom. When the free deal expires or you need new servers, you end up having to buy new Microsoft licenses and software. It’s like giving somebody a kitten for a present — they’ll still be paying for and cleaning up after your gift 10 years from now.

Most disturbingly, users are locked in, too: anybody using an iPhone, an old version of Windows, any version of Linux, or any other operating system or device not supported by Silverlight will be unable to use the Library of Congress’ new website. How is that compatible with the principles of democracy or librarianship? It’s taxation without web presentation. And how exactly is that a quantum leap forward? (If the LOC really wanted to make a quantum leap, it would open up its data.)

Giant package deals are the wrong way to make both technical and business decisions about software; it doesn’t matter who’s doing the packaging, or how. You should be able to use the best operating system for the job, the best database for the job, and the best programming language for the job. You should be able to hire developers and systems administrators, not Microsoft developers and Windows administrators, and should give them the freedom to use the best solution, not the Microsoft solution. Sometimes the Microsoft solution is best, sometimes it isn’t, but that’s something that shouldn’t be dictated unilaterally.

“I take comfort when I see one of our competitors looking to hire Microsoft developers instead of software developers, for reasons the hacker/entrepreneur Paul Graham explained well:

If you ever do find yourself working for a startup, here’s a handy tip for evaluating competitors. Read their job listings. Everything else on their site may be stock photos or the prose equivalent, but the job listings have to be specific about what they want, or they’ll get the wrong candidates.”

“During the years we worked on Viaweb I read a lot of job descriptions. A new competitor seemed to emerge out of the woodwork every month or so. The first thing I would do, after checking to see if they had a live online demo, was look at their job listings. After a couple years of this I could tell which companies to worry about and which not to. The more of an IT flavor the job descriptions had, the less dangerous the company was. The safest kind were the ones that wanted Oracle experience. You never had to worry about those. You were also safe if they said they wanted C++ or Java developers. If they wanted Perl or Python programmers, that would be a bit frightening– that’s starting to sound like a company where the technical side, at least, is run by real hackers. If I had ever seen a job posting looking for Lisp hackers, I would have been really worried.”

But it’s disappointing to see an institution you respect, admire, and fund with your tax dollars going down that same road. It’s even more disappointing because the Library of Congress does make smart decisions about technology. They announced another major project a few months back that took an entirely different approach to selecting the tools they would use. The people behind the World Digital Library sat down and thought about the best tools for the job, and they came up with an interesting and eclectic list: “python, django, postgres, jquery, solr, tilecache, ubuntu, trac, subversion, vmware”. Those tools are free, open-source, designed with developer productivity in mind, aren’t tightly linked to each other, and don’t inherently limit who can access your website. That’s what should matter.

Labels: library of congress, microsoft, open data, open source

Tuesday, December 11th, 2007

Open data and the Future of Bibliographic Control

We’ve got until December 15th to submit comments on the draft report produced by the Working Group on the Future of Bibliographic Control.

No—keep reading! This is important. People in the library profession need to be involved in this stuff. Further, people outside the profession need to be involved too. As the report notices, library data is used by many outside the library world, starting with library patrons, and extending even to Amazon.com. It shouldn’t go unnoticed, for example, that draft report mentions LibraryThing four times. For while LibraryThing uses library data, it was invented by and is mostly used by non-librarians.

Aaron Swartz, the dynamo behind Open Library, sent me a note about one important aspect of the draft report, namely what it’s missing: It doesn’t mention open data. There is serious discussion about sharing, but also the alarming proposal that the LC attempt to recoup more money from the sale of it’s data. That’s a shame. I’m not alone in believing that open access to library data is the future. A report about the future should confront the future.

The economy of library records is a complex one but not primarily a free one. By and large libraries pay the Dublin, Ohio-based OCLC for their records, even if the records were created at government expense. That model looks increasingly dated. And it is killing innovation.

It hasn’t killed LibraryThing yet, but the specter has always hung over our head. It’s why LibraryThing has—so far—not pitched itself to small libraries. OCLC doesn’t care about personal cataloging, and the libraries we use are—in every conversation I’ve had—enthusiastic about what we do. They want their data out there; they’re libraries for Pete’s sake! But if we offered data to public libraries we’d be cutting into the OCLC profit model. That could be dangerous.

Aaron invited me to sign onto a list of people interested in the issue. I did so. I invite you—any of you—to do so as well. The text says it perfectly:

“Bibliographic records are part of our shared cultural heritage and should be made available to the public for re-use without restriction. This will allow libraries to share records more efficiently, but will also make possible more advanced online sites for book-lovers, easier analysis by social scientists, interesting visualizations and summary statistics by journalists and others, as well as many other possibilities we cannot predict in advance.”

“Government agencies and public institutions are increasingly making data open. We strongly encourage the Library of Congress to join this movement by recommending that more bibliographic data is made available for access, re-use and re-distribution without restriction.”

The petition is here: http://www.okfn.org/wiki/OpenBibliographicData .

Labels: library of congress, open data, open library, Working Group on the Future of Bibliographic Control

Thursday, October 25th, 2007

What if LibraryThing lost 13% of its books?

Don’t worry. No, as the Washington Post recounts, it’s the Library of Congress that has lost 13% of its collection. Ouch!

I wonder how long a traditional “shelf read” would take. When I was at the UMich the Classics Department’s library* did one every Fall. Although it was only one room and they impressed most of the graduate students, it still took hours.

It’s too bad asking users for help is harder in the physical than in the digital world—although I’m sure a lot of thingamabrarians would pay for the privilege of rolling a cart through the LC’s stacks…

*available online through Filemaker of all things!

Labels: library of congress, physical world

Thursday, April 12th, 2007

WorldCat: Think locally, act globally

OCLC just announced a “pilot” of WorldCat Local. In essence, WorldCat local is OCLC providing libraries with a OPAC.

That’s the news. Here’s the opinion. Talis’ estimable Richard Wallis writes:

“Yet another clear demonstration that the library world is changing. The traditional boundaries between the ILS/LMS, and library and non-library data services are blurring. Get your circulation from here; your user-interface from there; get your global data from over there; your acquisitions from somewhere else; and blend it with data feeds from here, there and everywhere is becoming more and more a possibility.”

I think this is exactly wrong. OCLC isn’t creating a web service. They’re not contributing to the great data-service conversation. They’re trying to convert a data licensing monopoly into a services monopoly. If the OCLC OPAC plays nice with, say, the Talis Platform, I’ll eat my hat. If it allows outside Z39.50 access I’ll eat two hats.

They will, as the press release states “break down silos.” They’ll make one big silo and set the rules for access. The pattern is already clear. MIT thought that its bibliographic records were its own, but OCLC shut them down when they tried to act on that. The fact is, libraries with their data in OCLC are subject to OCLC rules. And since OCLC’s business model requires centralizing and restricting access to bibliographic data, the situation will not improve.

As a product, OCLC local will probably surpass the OPACs offered by the traditional vendors. It will be cleaner and work better. It may well be cheaper and easier to manage. There are a lot of good things about this. And—lest my revised logo be misunderstood—there are no bad people here. On the contrary, OCLC is full of wonderful people—people who’ve dedicated their lives to some of the highest ideals we can aspire. But the institution is dependent on a model that, with all the possibilities for sharing available today, must work against these ideals.

Keeping their data hidden, restricted and off the “live” web has hurt libraries more than we can ever know. Fifteen years ago, libraries were where you found out about books. One would have expected that to continue on the web–that searching for a book would turn up libraries alongside bookstores, authors and publishers.

It hasn’t worked out that way. Libraries are all-but-invisible on the web. Search for the “Da Vinci Code” and you won’t get the Library of Congress–the greatest collection of books and book data ever assembled–not even if you click through a hundred pages. You do get WorldCat, seventeen pages in!

The causes are multiple, and discussed before. But a major factor is how libraries deal with book data, and that’s largely a function of OCLC’s business model. Somehow institutions dedicated to the idea that knowledge should be freely available to all have come to the conclusion that knowledge about knowledge—book data—should not, and traditional library mottos like Boston‘s “Free to All” and Philadelphia‘s Liber Libere Omnibus (“Free books for all!”) given way to:

“No part of any Data provided in any form by WorldCat may be used, disclosed, reproduced, transferred or transmitted in any form without the prior written consent of OCLC except as expressly permitted hereunder.”

We now return you to our regularly-scheduled blogging.

Labels: library of congress, oclc, open data, worldcat local