The standard approach to IR system evaluation revolves around the notion of relevant and nonrelevant documents.
Evaluation of unranked retrieval sets: (1)The two most frequent and basic measures for information retrieval effectiveness are precision and recall.
(2)The measures of precision and recall concentrate the evaluation on the return of true positives, asking what percentage of the relevant documents have been found and how many false positives have also been returned.
(3)The advantage of having the two numbers for precision and recall is that one is more important than the other in many circumstances.
Evaluation of ranked retrieval results: (1) Precision, recall, and the F measure are set-based measures. They are com- puted using unordered sets of documents.
(2)In a ranked retrieval con- text, appropriate sets of retrieved documents are naturally given by the top k retrieved documents.
(3)An ROC curve plots the true positive rate or sensitivity against the false-positive rate or (1 − specificity).
(1)Sensitivity is just another term for recall. The false-positive rate is given by fp/(fp + tn).
(2)Specificity, given by tn/( f p + tn), was not seen as a very useful notion. Because the set of true negatives is always so large, its value would be al- most 1 for all information needs.
The success of an IR system depends on how good it is at satisfying the needs of these idiosyncratic humans, one informa- tion need at a time.
Marginal relevance is a better measure of utility for the user.
Evaluation at large search engine:non-relevance-based measures
(1) Clickthrough on first result
(2) Studies of user behavior in the lab
(3) A/Btesting
No comments:
Post a Comment