// computers › Information Retrieval

PageRank

Rank pages by importance: a page is important if important pages link to it.

PR(p) = (1-d)/N + d * sum(PR(q)/outlinks(q)) for all q linking to p

Frequently asked questions

Is PageRank like trusting the kid whose notes everyone borrows?

That is a lovely fit. The student whose notes the whole class borrows is trusted because everyone vouches for them - and a vouch from another well-trusted student counts for more. PageRank scores web pages the same way: a page is important if important pages link to it. Importance flows through the links and is computed by iterating until it settles, so being endorsed by already-respected pages lifts you more than many endorsements from obscure ones.

What is the core idea behind PageRank?

Importance by endorsement. A link from page A to page B is treated as A vouching for B. But not all votes count equally: a link from a highly important page is worth more than one from an obscure page. So a page is important if important pages link to it - a recursive definition. PageRank resolves this circularity by starting everyone equal and iterating: rank flows along links repeatedly until the scores stabilise, at which point each page's rank reflects the whole link structure of the web.

What is the 'random surfer' model?

It is the intuition behind the formula. Imagine someone browsing by clicking random links. From any page they follow one of its outgoing links at random (with probability d, the damping factor), or occasionally get bored and jump to a completely random page (with probability 1 - d). A page's PageRank is the long-run probability that this random surfer is on it at any moment. Pages that are easy to reach - linked from many places, or from heavily-visited places - are where the surfer spends more time, so they rank higher.

What does the damping factor d do?

It is the probability that the surfer keeps following links rather than teleporting to a random page; Google's classic value is 0.85. The teleport term (1 - d) serves two purposes: it models real browsing (people do jump to new sites), and mathematically it guarantees the iteration converges to a unique answer, even for pages with no outgoing links or isolated clusters that would otherwise trap or drain rank. Without damping, rank could pile up in dead ends or fail to settle.

Why does PageRank need to iterate instead of computing directly?

Because the definition is circular: a page's rank depends on the ranks of pages linking to it, which depend on theirs, and so on. Iteration breaks the deadlock - start everyone equal, recompute all ranks from the current estimates, and repeat. Each pass spreads importance one more link-hop through the graph, and the values converge to a stable fixed point (mathematically, the dominant eigenvector of the link matrix). For the web, this typically converges in a few dozen iterations despite billions of pages.

Is PageRank the same as what ranks search results?

It is one important signal, not the whole story. PageRank measures a page's general importance from the link graph, independent of any query. Search ranking combines it with relevance to the actual query (TF-IDF/BM25 and many other features) plus hundreds of additional signals. Historically PageRank was central to Google's advantage; today it is one of many factors in a far more complex system. The underlying idea - importance flowing through a graph - also powers citation analysis, social-influence scoring, and recommendation systems.