Archive for the ‘subjects’ Category

Thursday, February 5th, 2015

Subjects and the Ship of Theseus

I thought I might take a break to post an amusing photo of something I wrote out today:


The photo is a first draft of a database schema for a revamp of how LibraryThing will do library subjects. All told, it has 26 tables. Gulp.

About eight of the tables do what a good cataloging system would do:

  • Distinguishes the various subject systems (LCSH, Medical Subjects, etc.)
  • Preserves the semantic richness of subject cataloging, including the stuff that never makes it into library systems.
  • Breaks subjects into their facets (e.g., “Man-woman relationships — Fiction”) has two subject facets

Most of the tables, however, satisfy LibraryThing’s unusual core commitments: to let users do their own thing, like their own little library, but also to let them benefit from and participate in the data and contributions of others.(1) So it:

  • Links to subjects from various “levels,” including book-level, edition-level, ISBN-level and work-level.
  • Allows members to use their own data, or “inherit” subjects from other levels.
  • Allows for members to “play librarian,” improving good data and suppressing bad data.(2)
  • Allows for real-time, fully reversible aliasing of subjects and subject facets.

The last is perhaps the hardest. Nine years ago (!) I compared LibraryThing to the “Ship of Theseus,” a ship which is “preserved” although its components are continually changed. The same goes for much of its data, although “shifting sands” might be a better analogy. Accounting for this makes for some interesting database structures, and interesting programming. Not every system at LibraryThing does this perfectly. But I hope this structure will help us do that better for subjects.(3)

Weird as all this is, I think it’s the way things are going. At present most libraries maintain their own data, which, while generally copied from another library, is fundamentally siloed. Like an evolving species, library records descend from each other; they aren’t dynamically linked. The data inside the records are siloed as well, trapped in a non-relational model. The profession that invented metadata, and indeed invented sharing metadata, is, at least as far as its catalogs go, far behind.

Eventually that will end. It may end in a “Library Goodreads,” every library sharing the same data, with global changes possible, but reserved for special catalogers. But my bet is on a more LibraryThing-like future, where library systems will both respect local cataloging choices and, if they like, benefit instantly from improvements made elsewhere in the system.

When that future arrives, we got the schema!

1. I’m betting another ten tables are added before the system is complete.
2. The system doesn’t presume whether changes will be made unilaterally, or voted on. Voting, like much else, existings in a separate system, even if it ends up looking like part of the subject system.
3. This is a long-term project. Our first steps are much more modest–the tables have an order-of-use, not shown. First off we’re going to duplicate the current system, but with appropriate character sets and segmentation by thesaurus and language.

Labels: cataloging, subjects

Wednesday, August 15th, 2007

Come and get your “Erotic Stories, American”

Here are the the top 25 most popular Library of Congress Subject Heading (LCSH) pages on LibraryThing, according to Google Analytics.

I’m guessing this makes someone at the Library of Congress blush:

  1. Erotic Stories, American
  2. Photography of the nude
  3. Erotic fiction
  4. Historical fiction
  5. Erotic literature
  6. Love stories
  7. Psychological fiction
  8. Fantasy fiction
  9. Mystery fiction
  10. Erotic art
  11. Detective and mystery stories
  12. Characters and characteristics in literature
  13. Sex instruction for gay men
  14. Sexual dominance and submission
  15. Humorous stories
  16. Symbolism in literature
  17. Murder
  18. Australia > Social life and customs
  19. England > Social life and customs > 18th century
  20. Social classes
  21. Allusions in literature
  22. Humorous fiction
  23. Sex instruction
  24. Short stories
  25. Religion

The explanation involves a paradox. Erotica does so well because LibraryThing is a non-erotic site. The top subjects all win because of search-engine referrals. Google likes a mix of sites, so that erotic searches turn up something besides erotica. (This is particularly true if you have “safe search” enabled.) And LibraryThing has relatively high PageRank (PR), Google’s measure of a web site’s authority. Put these factors together and LibraryThing turns up high for erotic searches. For example, we’re currently Google’s number one site for “gay sex instruction.” Who would’ve thunk it?*

Of course, the “bounce rate” for these pages is astronomical. LibraryThing provides no actual sex instruction, just links to books about it—or rather links to metadata about books on sex instruction. That’s not what the searchers were looking for, and they leave as fast as they arrive.

As a side note, it’s sad to see so many top-level subjects in the list. I hope the bounce rate isn’t too high. Top-level subjects are where LCSH falls apart. Take a subject like “Historical fiction,” which has almost 8,000 works underneath it and no innate relevancy ranking. There can be little doubt–people don’t want to plough through 8,000 links!

*Can we start running ads on just the erotic pages.

Labels: erotica, LCSH, subjects