Probability ranking principle: Using a probabilistic model, the obvious order in which to present doc- uments to the user is to rank documents by their estimated probability of relevance with respect to the information need: P(R = 1|d, q).
Bayes optimal decision rule: If a set of retrieval results is to be returned, rather than an ordering, the Bayes optimal decision rule, the decision that minimizes the risk of loss, is to simply return documents that are more likely relevant than nonrelevant:
d is relevant iff P(R = 1|d, q) > P(R = 0|d, q).
In machine learning and probability, the bayes decision rule can reduce the error for the whole class of response variable. In this case, R = 1 & R = 0.
Binary Independence Model: Documents and queries are both represented as binary term incidence vectors. A document d is represented by the vector x⃗ = (x1, ..., xM) where xt = 1 if term t is present in document d and xt = 0 if t is not present in d.
Naive Bayes assumption is very important in modeling process. Conditional independence assumption that the presence or absence of a word in a document is independent of the presence or absence of any other word. In such a assumption, the computation is simple and intuitive and we do not need to compute the conditional probability and joint probability.
MLE makes the observed data maximally likely. MAP uses the prior knowledge about the distribution. We choose the most likely point value for probabilities based on the prior and the ob- served evidence.
Bayesian networks: a form of probabilistic graph- ical model.
Generative model: A traditional generative generative model of a language, of the kind familiar from formal language model theory, can be used either to recognize or to generate strings.
Language model: a function that puts a probability measure over strings drawn from some vocabulary.
Query likelihood model: rank documents by model P(d|q), where the probability of a document is interpreted as the likelihood that it is relevant to the query.
A translation model lets you generate query words not in a document by translation to alternate terms with similar meaning. This also provides a basis for performing cross-language IR.
No comments:
Post a Comment