Description
maths/spearman_rank_correlation_coefficient.py has two bugs:
1. Tied values handled incorrectly
assign_ranks() gives tied values sequential ranks instead of averaged ranks. The Spearman formula requires averaged ranks when ties are present.
| Input |
Implementation |
Correct (scipy) |
Error |
| x=[1,2,2,4] y=[1,2,3,4] |
1.000 |
0.950 |
5% |
| x=[1,1,1,1] y=[1,2,3,4] |
1.000 |
0.500 |
100% |
| x=[10,20,20,30,30,30] y=[1..6] |
1.000 |
0.929 |
8% |
The worst case reports perfect correlation (1.0) when the true value is 0.5.
2. ZeroDivisionError on n=1
rho = 1 - (6 * d_squared) / (n * (n**2 - 1)) # n=1: division by 0
Reproduction
from spearman_rank_correlation_coefficient import calculate_spearman_rank_correlation
# Bug 1: ties
print(calculate_spearman_rank_correlation([1,1,1,1], [1,2,3,4]))
# Output: 1.0 (should be ~0.5)
# Bug 2: n=1
print(calculate_spearman_rank_correlation([1], [1]))
# ZeroDivisionError
Suggested Fix
def assign_ranks(data):
n = len(data)
ranked_data = sorted((value, index) for index, value in enumerate(data))
ranks = [0.0] * n
i = 0
while i < n:
j = i
while j < n - 1 and ranked_data[j + 1][0] == ranked_data[i][0]:
j += 1
avg_rank = (i + j) / 2.0 + 1 # averaged rank for ties
for k in range(i, j + 1):
ranks[ranked_data[k][1]] = avg_rank
i = j + 1
return ranks
And add input validation:
if n < 2:
raise ValueError("Need at least 2 data points")
Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy
Description
maths/spearman_rank_correlation_coefficient.pyhas two bugs:1. Tied values handled incorrectly
assign_ranks()gives tied values sequential ranks instead of averaged ranks. The Spearman formula requires averaged ranks when ties are present.The worst case reports perfect correlation (1.0) when the true value is 0.5.
2. ZeroDivisionError on n=1
Reproduction
Suggested Fix
And add input validation:
Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy