Description
graphs/page_rank.py has three correctness issues:
1. Ranks initialized to 1 instead of 1/n
for node in nodes:
ranks[node.name] = 1 # should be 1 / len(nodes)
PageRank values should form a probability distribution summing to 1.0. With 3 nodes the initial sum is 3.0. The algorithm converges to values proportional to correct PageRank but with wrong magnitudes, so the output is not usable as a probability distribution.
2. No dangling node handling
Nodes with no outlinks leak rank out of the system. Proper PageRank redistributes their rank evenly. This causes incorrect relative rankings in graphs with dead-end nodes.
Test: A -> B -> C (C has no outlinks). Implementation gives C rank 0.386, correct value is 0.474 (23% error).
3. No convergence check
Default limit=3 iterations with no convergence check. Most graphs need 20-50 iterations. The function also prints to stdout on every iteration.
Suggested Fix
def page_rank(nodes, d=0.85, tol=1e-8, max_iter=100):
n = len(nodes)
ranks = {node.name: 1.0 / n for node in nodes}
outbounds = {node.name: len(node.outbound) for node in nodes}
for _ in range(max_iter):
new_ranks = {}
# Dangling node rank redistribution
dangling_sum = sum(ranks[node.name] for node in nodes if outbounds[node.name] == 0)
for node in nodes:
new_ranks[node.name] = (1 - d) / n + d * (
sum(ranks[ib] / outbounds[ib] for ib in node.inbound)
+ dangling_sum / n
)
# Convergence check
if sum(abs(new_ranks[k] - ranks[k]) for k in ranks) < tol:
break
ranks = new_ranks
return ranks
Reproduction
from page_rank import Node, page_rank
nodes = [Node("A"), Node("B"), Node("C")]
# A -> B, A -> C, B -> C, C -> A
nodes[1].add_inbound("A"); nodes[0].add_outbound("B")
nodes[2].add_inbound("A"); nodes[0].add_outbound("C")
nodes[2].add_inbound("B"); nodes[1].add_outbound("C")
nodes[0].add_inbound("C"); nodes[2].add_outbound("A")
ranks = page_rank(nodes, limit=100)
print(sum(ranks.values())) # prints ~3.0, should be ~1.0
Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy
Description
graphs/page_rank.pyhas three correctness issues:1. Ranks initialized to 1 instead of 1/n
PageRank values should form a probability distribution summing to 1.0. With 3 nodes the initial sum is 3.0. The algorithm converges to values proportional to correct PageRank but with wrong magnitudes, so the output is not usable as a probability distribution.
2. No dangling node handling
Nodes with no outlinks leak rank out of the system. Proper PageRank redistributes their rank evenly. This causes incorrect relative rankings in graphs with dead-end nodes.
Test:
A -> B -> C(C has no outlinks). Implementation gives C rank 0.386, correct value is 0.474 (23% error).3. No convergence check
Default
limit=3iterations with no convergence check. Most graphs need 20-50 iterations. The function also prints to stdout on every iteration.Suggested Fix
Reproduction
Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy