- motivation: to rank-order the documents matching a query by giving a score to each (query,document) pair
parametric and zone indexes
- index and retrieve documents by metadata.
- parametric index vs zone index: fixed vocabulary, whatever vocabulary from the text of that zone.



- weighted zone scoring
- learning weights
- the optimal weight g
machine learning algorithm
term frequency and weighting
- intuition: scores relate to term frequency, but are all words equally important?
- free text query: document - the set of weights, bag of words model
score = the sum of all terms - inverse document frequency
- tf-idf weighting
terms with lower document frequency weigh higher

the vector space for scoring
- dot products : similarity between two documents
the magnitude of the vector difference? the effect of document length.


-
query as vectors
computation is expensive -
computing vector scores

Variant tf–idf functions

- Pivoted normalized document length
the relationship between document length and relevance

linear model
machine learning techniques
网友评论