Skip to content

Bug: Spearman correlation incorrect for tied values, ZeroDivisionError on n=1 #14887

Description

@devladpopov

Description

maths/spearman_rank_correlation_coefficient.py has two bugs:

1. Tied values handled incorrectly

assign_ranks() gives tied values sequential ranks instead of averaged ranks. The Spearman formula requires averaged ranks when ties are present.

Input Implementation Correct (scipy) Error
x=[1,2,2,4] y=[1,2,3,4] 1.000 0.950 5%
x=[1,1,1,1] y=[1,2,3,4] 1.000 0.500 100%
x=[10,20,20,30,30,30] y=[1..6] 1.000 0.929 8%

The worst case reports perfect correlation (1.0) when the true value is 0.5.

2. ZeroDivisionError on n=1

rho = 1 - (6 * d_squared) / (n * (n**2 - 1))  # n=1: division by 0

Reproduction

from spearman_rank_correlation_coefficient import calculate_spearman_rank_correlation

# Bug 1: ties
print(calculate_spearman_rank_correlation([1,1,1,1], [1,2,3,4]))
# Output: 1.0 (should be ~0.5)

# Bug 2: n=1
print(calculate_spearman_rank_correlation([1], [1]))
# ZeroDivisionError

Suggested Fix

def assign_ranks(data):
    n = len(data)
    ranked_data = sorted((value, index) for index, value in enumerate(data))
    ranks = [0.0] * n
    i = 0
    while i < n:
        j = i
        while j < n - 1 and ranked_data[j + 1][0] == ranked_data[i][0]:
            j += 1
        avg_rank = (i + j) / 2.0 + 1  # averaged rank for ties
        for k in range(i, j + 1):
            ranks[ranked_data[k][1]] = avg_rank
        i = j + 1
    return ranks

And add input validation:

if n < 2:
    raise ValueError("Need at least 2 data points")

Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions