Monday, May 18, 2009

3000 Characters in Search of Content

Which of these is most likely to make the heart of a reference librarian sing?

a) the sight of a Nancy Pearl action figure moving her finger to the Shhh position.
b) the addition of Diabetes Type 1 to the Dewey Decimal System.
c) the arrest of Shelly J. Koontz for not returning a library book.
d) the retrospective digitization of newspapers and magazines.

If you guessed (d) then you might be interested in this great story from Australia. It seems that the National Library of Australia embarked on an ambitious project to digitize over 3 million pages from Australian newspapers (from 1803-1950s). Wait: pause the story for a little background.
While scanning images is by now pretty mundane, the recognition of those images as letters and words is still more than a bit spotty. In order to search those scanned pages for particular subjects, the search engine must recognize each little squiggle as a specific letter and then groups of those letters as words. Faded type, blotches, tears and other blemishes on the originals make for a lot of mistakes. Human beings (remember them?) are extremely useful for correcting these errors.
So, back to the story. The National Library, not having nearly an adequate budget to hire humans to read every article, quietly put out the word among researchers that help was needed. Before they knew it, they had over 3000 volunteers (genealogists, historians, researchers) making words out of squiggles and even applying tags to stories. Some volunteers are working up to 50 hours a week! Talk about singing hearts...
(Thanks to Research Buzz for the story)

No comments: