// computers › Information Retrieval

TF-IDF & BM25 (Relevance Ranking)

Score how relevant each document is: reward query words that are frequent here but rare overall.

tf-idf = term_frequency * log(N / docs_containing_term); BM25 refines this with saturation + length

Frequently asked questions

Why does 'quokka' matter more than 'habitat' when I search?

When you search 'quokka habitat', the rare word 'quokka' should count far more for ranking than the common word 'habitat', which appears in countless pages. TF-IDF and BM25 capture this: they reward words that are frequent in a document but rare across the whole collection (high inverse document frequency). So a page is ranked highly for distinctive query words, not for common filler - which is why the unusual term in your search drives the results.

What do TF and IDF each contribute?

TF (term frequency) rewards documents that use a query word a lot - a page mentioning 'jaguar' ten times is probably more about jaguars than one mentioning it once. IDF (inverse document frequency) down-weights words that appear in many documents - common words like 'the' carry little meaning, so they get a low IDF, while a rare, distinctive word gets a high one. Multiplying them, tf-idf rewards words that are frequent in this document but rare across the corpus - exactly the words that make a document distinctively about a topic.

Why take the logarithm in IDF?

Because raw rarity scales too harshly. Without the log, a word in 1 of a million documents would get a weight a million times larger than a word in every document, swamping everything else. The logarithm compresses this range so differences in rarity matter but do not dominate absurdly. It reflects diminishing returns: the jump in informativeness from 'in half the docs' to 'in a quarter' is meaningful, but beyond a point, extra rarity adds little. The log captures that intuition mathematically.

How does BM25 improve on plain TF-IDF?

BM25 fixes two weaknesses. First, term-frequency saturation: in plain tf-idf, a word appearing 100 times scores 100 times as much as appearing once, which is unrealistic - after a few occurrences, more do not mean much more relevance. BM25 makes TF level off via a saturation curve. Second, document-length normalisation: long documents naturally contain more words, unfairly inflating their scores; BM25 discounts for length so a short, focused document is not beaten just by a long, rambling one. These refinements make BM25 the de facto standard ranking function.

Does TF-IDF or BM25 find the documents, or just rank them?

Just rank them. Finding which documents contain the query terms is the inverted index's job; TF-IDF and BM25 are the scoring layer that orders those candidates by relevance. The typical pipeline is: the inverted index returns all documents matching the query, then a ranking function scores each, then the highest-scoring are shown first. Separating retrieval (which docs?) from ranking (in what order?) is a fundamental design of search engines.

Is TF-IDF still used now that search uses machine learning?

Yes, very much. BM25 in particular remains a strong, fast, interpretable baseline and is the default ranking in widely used engines like Elasticsearch and Lucene. Modern systems often use a two-stage approach: BM25 (or similar) quickly retrieves and roughly ranks a candidate set, then a heavier machine-learning or neural model re-ranks the top results. So far from being obsolete, TF-IDF and BM25 are the efficient first line that makes large-scale neural ranking affordable.