Jun Fu: Reading Notes For Week 7

The idea of relevance feedback (RF) is to involve the user in the IR process so as to improve the final result set.

The Rocchio algorithm is the classic algorithm for implementing relevance feedback. It mod- els a way of incorporating relevance feedback information into the vector space model. However, we don’t know the truly relevant docs

The success of RF depends on certain assumptions. First, the user has to have sufficient knowledge to be able to make an initial query that is at least some- where close to the documents they desire. Second, the RF approach requires relevant documents to be similar to each other.

Pseudo relevance feedback (blind relevance feedback): It automates the manual part of RF, so that the user gets improved retrieval performance without an extended interaction. The method is to do normal retrieval to find an initial set of most relevant documents, to then assume that the top k ranked documents are rel- evant, and finally to do RF as before under this assumption.

In query expansion, on the other hand, users give additional input on query words or phrases, possibly suggesting additional query terms.

Methods for building a thesaurus for query expansion
(1)Use of a controlled vocabulary that is maintained by human editors. Here, there is a canonical term for each concept.
(2)A manual thesaurus. Here, human editors have built up sets of synony- mous names for concepts, without designating a canonical term.
(3)An automatically derived thesaurus.
(4)Query reformulations based on query log mining

Query expansion is often effective in increasing recall. However, there is a high cost to manually producing a thesaurus and then updating it for sci- entific and terminological developments within a field.

Jun Fu

Friday, February 20, 2015

Reading Notes For Week 7

No comments:

Post a Comment