Presentation given on the second day of the code4lib Conference held Feb. 15-17, 2006, at LaSells Stewart Center, Oregon State University, Corvallis, Oregon.
Generating recommendations in OPACS: initial results and open areas for exploration : In the context of a research and prototyping project, the California Digital Library is using catalog content indexed in XTF, along with over 9 million historical circulation transaction records and other external data, to generate recommendations for an academic audience. Early results are promising. This talk will focus on methods, challenges, and plans for further development. -- Library Text Mining : Using the TeraGrid1 and the SRB DataGrid2, we have sufficient
computational and storage facilities to run normally prohibitively
expensive processing tasks. By integrating text and data mining
tools3[4] within the Cheshire35 information architecture, we can
parse the natural language present in 20 million MARC records (the
University of California's MELVYL collection) and extract information to
provide to search/retrieve applications. In this talk, we'll discuss
the results of applying new techniques to "old" data. -- Anatomy of aDORe : The aDORe Archive is a write-once/read-many storage approach for Digital Objects and their constituent datastreams. First, XML-based representations of multiple Digital Objects are concatenated into a single, valid XML file named an XMLtape. Second, ARC files, as introduced by the Internet Archive, are used to contain the constituent datastreams of the Digital Objects. The software was developed by the LANL Digital Library Research & Prototyping Team and is available under GNU LGPL license.