Skip to content

Bug: PageRank initializes ranks to 1 instead of 1/n, ignores dangling nodes #14885

Description

@devladpopov

Description

graphs/page_rank.py has three correctness issues:

1. Ranks initialized to 1 instead of 1/n

for node in nodes:
    ranks[node.name] = 1  # should be 1 / len(nodes)

PageRank values should form a probability distribution summing to 1.0. With 3 nodes the initial sum is 3.0. The algorithm converges to values proportional to correct PageRank but with wrong magnitudes, so the output is not usable as a probability distribution.

2. No dangling node handling

Nodes with no outlinks leak rank out of the system. Proper PageRank redistributes their rank evenly. This causes incorrect relative rankings in graphs with dead-end nodes.

Test: A -> B -> C (C has no outlinks). Implementation gives C rank 0.386, correct value is 0.474 (23% error).

3. No convergence check

Default limit=3 iterations with no convergence check. Most graphs need 20-50 iterations. The function also prints to stdout on every iteration.

Suggested Fix

def page_rank(nodes, d=0.85, tol=1e-8, max_iter=100):
    n = len(nodes)
    ranks = {node.name: 1.0 / n for node in nodes}
    outbounds = {node.name: len(node.outbound) for node in nodes}

    for _ in range(max_iter):
        new_ranks = {}
        # Dangling node rank redistribution
        dangling_sum = sum(ranks[node.name] for node in nodes if outbounds[node.name] == 0)
        for node in nodes:
            new_ranks[node.name] = (1 - d) / n + d * (
                sum(ranks[ib] / outbounds[ib] for ib in node.inbound)
                + dangling_sum / n
            )
        # Convergence check
        if sum(abs(new_ranks[k] - ranks[k]) for k in ranks) < tol:
            break
        ranks = new_ranks
    return ranks

Reproduction

from page_rank import Node, page_rank

nodes = [Node("A"), Node("B"), Node("C")]
# A -> B, A -> C, B -> C, C -> A
nodes[1].add_inbound("A"); nodes[0].add_outbound("B")
nodes[2].add_inbound("A"); nodes[0].add_outbound("C")
nodes[2].add_inbound("B"); nodes[1].add_outbound("C")
nodes[0].add_inbound("C"); nodes[2].add_outbound("A")

ranks = page_rank(nodes, limit=100)
print(sum(ranks.values()))  # prints ~3.0, should be ~1.0

Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions