Thursday, January 29, 2015

Reading Notes For Week 4

Chapter 1.3 and 1.4

Posting list intersection is a crucial operation in processing boolean queries.
Standard postings list intersection operations remain necessary when both terms of a query are very common.

If the lengths of the postings lists are x and y, the intersection takes O(x + y) operations. Time complexity is also a big issue in this step.

Query optimization is the process of selecting how to organize the work of answering a query so that the least total amount of work needs to be done by the system.

A proximity operator is a way of specifying that two terms in a query must occur close to each other in a document, where closeness may be measured by limiting the allowed number of intervening words or by reference to a structural unit such as a sentence or paragraph.

There is a tradeoff between using AND operator and OR operator. AND operators tends to produce high precision but low recall searches, while using OR operators gives low precision but high recall searches, and it is difficult or impossible to find a satisfactory middle ground.

chapter 6

Boolean queries: A document either matches or does not match a query.
It is essential for a search engine to rank-order the documents matching a query.

Metadata would include fields, such as the date of creation and the format of the document, as well the author and possibly the title of the documentZones are similar to fields, except the contents of a zone can be arbitrary
free text.

The representation of a set of documents as vectors in a common vector space is known as the vector space model and is fundamental to a host of information retrieval (IR) operations including scoring documents on a query, document classification, and document clustering.

Vector Space Ranking
(1)Represent the query as a weighted tf-idf vector
(2)Represent each document as a weighted tf-idf vector
(3)Compute the cosine similarity score for the query vector and each document vector
(4)Rank documents with respect to the query by score
(5)Return the top K (e.g., K = 10) to the user

No comments:

Post a Comment