Graduation date: 2008
Accessing information on the Web has become ingrained into our daily lives, and we seek information from many different sources, including conference and journal publications, personal web pages, and others. Increasingly, web-based information retrieval systems such as web-based search engines, library on-line catalog systems, and subscription-based federated search systems are made available to provide an interface to collections of information from these sources. Because the quantity of new information available every day exceeds how much information individuals can handle effectively, we spend significant effort in locating information, often unsuccessfully.
This dissertation consists of three scholarly articles presenting a broad set of results with the goal of helping people find interesting information in large web document collections. The results cover three specific challenges: designing and utilizing Web document recommendation systems based on human judgment, improving recommendations based on users’ web usage as a source of implicit relevance feedback data, and understanding and designing metasearch systems for academic materials. To address these challenges, a combination of offline analysis and user studies is used.
We recommend documents by determining the similarity between users’ information needs and the previously viewed documents by other users. We conducted experiments and observational studies to evaluate the system that we developed, and in both cases we found that recommendations from prior users with similar queries could increase the efficiency and effectiveness of document search.
To improve recommendation effectiveness, we studied users’ click data from complete search sessions, and found that applying all of the click data in a search session as relevance feedback has the potential to increase both precision and recall of search results. In particular, our data provides evidence that the last visited document of each search session is a highly reliable source of implicit relevance feedback data.
In understanding and designing systems for academic materials, we designed a metasearch system for retrieving materials from OSU library’s subscribed databases and catalogs. We conducted a think-aloud usability experiment and found that modeling the familiarity and ease of use of commercial web search engines is an important factor to attract undergraduates. However, when undergraduates faced the interface that felt familiar, they expected similar performance to a web search engine, such as its quality of ranked results, its speed and other qualities.