Elo Rating System

Overview

The Elo rating system is one of the most widely used rating systems in the world. Developed by Hungarian-American physics professor Arpad Elo, it was originally designed for chess but has since been adapted for many other competitive domains including video games, basketball, football, and baseball.

The Elo system is named after its creator and was first introduced as the official rating system for the United States Chess Federation in 1960, and later adopted by the World Chess Federation (FIDE) in 1970.

How It Works

The Elo rating system is based on the following principles:

Each player has a rating that represents their skill level
The difference between ratings determines the expected outcome of a match
After each match, ratings are adjusted based on the actual outcome compared to the expected outcome

The core formula for calculating the expected score (probability of winning) is:

\[E_A = \frac{1}{1 + 10^{(R_B - R_A) / 400}}\]

Where: - \(E_A\) is the expected score for player A - \(R_A\) is the rating of player A - \(R_B\) is the rating of player B

After a match, the ratings are updated using:

\[R'_A = R_A + K \times (S_A - E_A)\]

Where: - \(R'_A\) is the new rating for player A - \(K\) is the K-factor (determines how quickly ratings change) - \(S_A\) is the actual score (1 for win, 0.5 for draw, 0 for loss) - \(E_A\) is the expected score

Advantages

Simplicity: The Elo system is easy to understand and implement
Transparency: Players can easily see how their rating changes after each match
Proven Track Record: Used successfully for decades in various competitive domains
Zero-Sum: In a two-player game, the rating points one player gains are exactly what the other player loses
Self-Correcting: Ratings naturally adjust over time as more matches are played

Limitations

Requires Many Matches: Needs a significant number of matches to reach an accurate rating
No Confidence Intervals: Unlike Glicko, Elo doesn’t account for rating reliability
Assumes Stable Performance: Doesn’t account for player improvement or decline over time
K-Factor Sensitivity: Results are highly dependent on the chosen K-factor
No Team Dynamics: In team sports, doesn’t account for individual contributions

Implementation in Elote

Elote provides a straightforward implementation of the Elo rating system through the EloCompetitor class:

from elote import EloCompetitor

# Create two competitors with different initial ratings
player1 = EloCompetitor(initial_rating=1500)
player2 = EloCompetitor(initial_rating=1600)

# Get win probability
win_probability = player2.expected_score(player1)
print(f"Player 2 win probability: {win_probability:.2%}")

# Record a match result
player1.beat(player2)  # Player 1 won!

# Ratings are automatically updated
print(f"Player 1 new rating: {player1.rating}")
print(f"Player 2 new rating: {player2.rating}")

Customization

The EloCompetitor class allows for customization of the K-factor:

# Create a competitor with a custom K-factor
player = EloCompetitor(initial_rating=1500, k_factor=32)

A higher K-factor makes ratings change more quickly, while a lower K-factor makes them more stable. Common K-factor values:

40: For new players with fewer than 30 games (FIDE standard)
20: For players with ratings under 2400 (FIDE standard)
10: For elite players with ratings over 2400 (FIDE standard)

Real-World Applications

The Elo rating system is used in many domains:

Chess: FIDE and national chess federations
Video Games: League of Legends, DOTA 2, and many other competitive games
Sports: Used for international football rankings
Online Matchmaking: Many platforms use Elo or Elo-derived systems to match players of similar skill

References

Elo, Arpad (1978). The Rating of Chessplayers, Past and Present. Arco. ISBN 0-668-04721-6.
Glickman, Mark E. (1995). “A Comprehensive Guide to Chess Ratings”. American Chess Journal, 3, 59-102.
Silver, Nate (2015). “How We Calculate NBA Elo Ratings”. FiveThirtyEight.