Arenas API Reference¶

This page provides detailed API documentation for all arena classes in Elote.

Base Arena¶

class elote.arenas.base.BaseArena[source]¶

Bases: object

Base abstract class for all arena implementations.

Arenas manage competitions between multiple competitors, handling matchups, tournaments, and leaderboard generation. This class defines the interface that all arena implementations must follow.

abstractmethod export_state() → Dict[str, Any][source]¶

Export the current state of this arena for serialization.

Returns:

dict: A dictionary containing all necessary information to recreate: this arena’s current state.

abstractmethod leaderboard() → List[Tuple[Any, float]][source]¶

Generate a leaderboard of all competitors.

Returns:: list: A sorted list of competitors and their ratings.

abstractmethod matchup(a: Any, b: Any) → Any[source]¶

Process a single matchup between two competitors.

Args:: a: The first competitor or competitor identifier. b: The second competitor or competitor identifier.
Returns:: The result of the matchup.

abstractmethod set_competitor_class_var(name: str, value: Any) → None[source]¶

Set a class variable on all competitors in this arena.

This method allows for global configuration of all competitors managed by this arena.

Args:: name (str): The name of the class variable to set. value: The value to set for the class variable.

abstractmethod tournament(matchups: List[Tuple[Any, Any]]) → None[source]¶

Run a tournament with the given matchups.

A tournament consists of multiple matchups between competitors.

Args:: matchups (list): A list of matchup pairs to process.

class elote.arenas.base.Bout(a: Any, b: Any, predicted_outcome: float | None, outcome: Any, attributes: Dict[str, Any] | None = None)[source]¶

Bases: object

A single bout between two competitors.

Initialize a bout.

Args:: a: The first competitor b: The second competitor predicted_outcome: The predicted probability of a winning outcome: The actual outcome of the bout attributes: Optional dictionary of additional attributes

__init__(a: Any, b: Any, predicted_outcome: float | None, outcome: Any, attributes: Dict[str, Any] | None = None) → None[source]¶

Initialize a bout.

Args:: a: The first competitor b: The second competitor predicted_outcome: The predicted probability of a winning outcome: The actual outcome of the bout attributes: Optional dictionary of additional attributes

actual_winner() → str | None[source]¶

Return the actual winner of the bout based on the outcome.

Returns:: str or None: ‘a’ if a won, ‘b’ if b won, None if it was a draw or unclear

false_negative(threshold: float = 0.5) → bool[source]¶

Check if this bout is a false negative prediction.

A false negative occurs when the model incorrectly predicts a non-win.

Args:: threshold (float): The probability threshold for a negative prediction.
Returns:: bool: True if this bout is a false negative, False otherwise.

false_positive(threshold: float = 0.5) → bool[source]¶

Check if this bout is a false positive prediction.

A false positive occurs when the model incorrectly predicts a win.

Args:: threshold (float): The probability threshold for a positive prediction.
Returns:: bool: True if this bout is a false positive, False otherwise.

predicted_loser(lower_threshold: float = 0.5, upper_threshold: float = 0.5) → str | None[source]¶

Determine the predicted loser of this bout.

Args:: lower_threshold (float): The lower probability threshold for predictions. upper_threshold (float): The upper probability threshold for predictions.
Returns:: str: The identifier of the predicted loser, or None if no loser is predicted.

predicted_winner(lower_threshold: float = 0.5, upper_threshold: float = 0.5) → str | None[source]¶

Determine the predicted winner of this bout.

Args:: lower_threshold (float): The lower probability threshold for predictions. upper_threshold (float): The upper probability threshold for predictions.
Returns:: str: The identifier of the predicted winner, or None if no winner is predicted.

true_negative(threshold: float = 0.5) → bool[source]¶

Check if this bout is a true negative prediction.

A true negative occurs when the model correctly predicts a non-win.

Args:: threshold (float): The probability threshold for a negative prediction.
Returns:: bool: True if this bout is a true negative, False otherwise.

true_positive(threshold: float = 0.5) → bool[source]¶

Check if this bout is a true positive prediction.

A true positive occurs when the model correctly predicts a win.

Args:: threshold (float): The probability threshold for a positive prediction.
Returns:: bool: True if this bout is a true positive, False otherwise.

class elote.arenas.base.History[source]¶

Bases: object

Tracks the history of bouts (matchups) and provides analysis methods.

This class stores the results of matchups and provides methods to analyze the performance of the rating system.

Initialize an empty history of bouts.

__init__() → None[source]¶: Initialize an empty history of bouts.

accuracy_by_prior_bouts(arena: BaseArena, thresholds: Tuple[float, float] | None = None, bin_size: int = 5) → Dict[int, Dict[str, Any]][source]¶

Calculate accuracy based on the number of prior bouts for each competitor.

This method analyzes how accuracy changes as competitors participate in more bouts, properly accounting for draws as a third outcome category.

Args:: arena (BaseArena): The arena containing the competitors and their history thresholds (tuple, optional): Tuple of (lower_threshold, upper_threshold) for predictions bin_size (int): Size of bins for grouping bout counts
Returns:: dict: A dictionary with ‘binned’ key containing binned accuracy data

add_bout(bout: Bout) → None[source]¶

Add a bout to the history.

Args:: bout (Bout): The bout object to add to the history.

calculate_metrics(lower_threshold: float = 0.5, upper_threshold: float = 0.5) → Dict[str, float][source]¶

Calculate performance metrics based on the confusion matrix.

Args:: lower_threshold: The lower threshold for prediction (below this is a prediction for the second competitor) upper_threshold: The upper threshold for prediction (above this is a prediction for the first competitor)
Returns:: A dictionary with metrics including accuracy, precision, recall, F1 score, and the confusion matrix

calculate_metrics_with_draws(lower_threshold: float = 0.33, upper_threshold: float = 0.66) → Dict[str, Any][source]¶

Calculate evaluation metrics for the bout history, treating predictions between thresholds as explicit draw predictions.

Args:: lower_threshold (float): The lower probability threshold for predictions. upper_threshold (float): The upper probability threshold for predictions.
Returns:: dict: A dictionary containing accuracy, precision, recall, F1 score, and draw metrics.

confusion_matrix(lower_threshold: float = 0.45, upper_threshold: float = 0.55) → Dict[str, int][source]¶

Calculate the confusion matrix for the history of bouts.

Args:: lower_threshold: The lower threshold for prediction (below this is a prediction for the second competitor) upper_threshold: The upper threshold for prediction (above this is a prediction for the first competitor)
Returns:: A dictionary with confusion matrix metrics: {‘tp’: int, ‘fp’: int, ‘tn’: int, ‘fn’: int}

get_calibration_data(n_bins: int = 10) → Tuple[List[float], List[float]][source]¶

Compute calibration data from the bout history.

This method extracts predicted probabilities and actual outcomes from the bout history and prepares them for calibration curve plotting.

Args:

n_bins (int): Number of bins to use for calibration curve.

Returns:

tuple: (y_true, y_prob) where:

y_true: List of actual outcomes (1.0 for wins, 0.0 for losses)
y_prob: List of predicted probabilities

optimize_thresholds(method: str = 'L-BFGS-B', initial_thresholds: Tuple[float, float] = (0.5, 0.5)) → Tuple[float, List[float]][source]¶

Optimize prediction thresholds using scipy.optimize.

This method uses scipy’s optimization algorithms to find the best thresholds for maximizing prediction accuracy.

Args:

method (str): The optimization method to use (e.g., ‘L-BFGS-B’, ‘Nelder-Mead’) initial_thresholds (tuple): Initial guess for (lower_threshold, upper_threshold)

Returns:

tuple: (best_accuracy, best_thresholds) where:

best_accuracy: The accuracy achieved with the optimized thresholds
best_thresholds: List of [lower_threshold, upper_threshold]

random_search(trials: int = 1000) → Tuple[float, List[float]][source]¶

Search for optimal prediction thresholds using random sampling.

This method performs a random search to find the best lower and upper thresholds that maximize the overall accuracy, including draws.

Args:: trials (int): The number of random threshold pairs to try.
Returns:: tuple: A tuple containing (best_accuracy, best_thresholds).

report_results(lower_threshold: float = 0.5, upper_threshold: float = 0.5) → List[Dict[str, Any]][source]¶

Generate a report of the results in this history.

Args:: lower_threshold (float): The lower probability threshold for predictions. upper_threshold (float): The upper probability threshold for predictions.
Returns:: list: A list of dictionaries containing the results of each bout.

Lambda Arena¶

class elote.arenas.lambda_arena.LambdaArena(func: ~typing.Callable[[...], bool | None], base_competitor: ~typing.Type[~elote.competitors.base.BaseCompetitor] = <class 'elote.competitors.elo.EloCompetitor'>, base_competitor_kwargs: ~typing.Dict[str, ~typing.Any] | None = None, initial_state: ~typing.Dict[~typing.Any, ~typing.Dict[str, ~typing.Any]] | None = None)[source]¶

Bases: BaseArena

Initialize a LambdaArena with a comparison function.

The LambdaArena uses a provided function to determine the outcome of matchups between competitors. This is particularly useful for comparing objects that aren’t competitors themselves.

Args:

func (callable): A function that takes two arguments (a, b) and returns: True if a beats b, False if b beats a, and None for a draw.
base_competitor (class): The competitor class to use for ratings.: Defaults to EloCompetitor.
base_competitor_kwargs (dict, optional): Keyword arguments to pass to: the base_competitor constructor.
initial_state (dict, optional): Initial state for competitors, mapping: competitor IDs to their initial parameters.

__init__(func: ~typing.Callable[[...], bool | None], base_competitor: ~typing.Type[~elote.competitors.base.BaseCompetitor] = <class 'elote.competitors.elo.EloCompetitor'>, base_competitor_kwargs: ~typing.Dict[str, ~typing.Any] | None = None, initial_state: ~typing.Dict[~typing.Any, ~typing.Dict[str, ~typing.Any]] | None = None) → None[source]¶

Initialize a LambdaArena with a comparison function.

The LambdaArena uses a provided function to determine the outcome of matchups between competitors. This is particularly useful for comparing objects that aren’t competitors themselves.

Args:

func (callable): A function that takes two arguments (a, b) and returns: True if a beats b, False if b beats a, and None for a draw.
base_competitor (class): The competitor class to use for ratings.: Defaults to EloCompetitor.
base_competitor_kwargs (dict, optional): Keyword arguments to pass to: the base_competitor constructor.
initial_state (dict, optional): Initial state for competitors, mapping: competitor IDs to their initial parameters.

clear_history() → None[source]¶: Clear the history of bouts in this arena.

evaluate_performance(eval_bouts: List[Tuple[str, str, float | None]], progress_bar: bool = True) → None[source]¶

Evaluate the performance of the competitors based on a list of evaluation bouts.

Args:: eval_bouts (list): A list of (competitor_a, competitor_b, outcome) tuples. progress_bar (bool, optional): Whether to display a progress bar.

expected_score(a: Any, b: Any) → float[source]¶

Calculate the expected score for a matchup between two competitors.

This method returns the probability that competitor a will beat competitor b.

Args:: a: The first competitor or competitor identifier. b: The second competitor or competitor identifier.
Returns:: float: The probability that a will beat b (between 0 and 1).

export_state() → Dict[Any, Dict[str, Any]][source]¶

Export the current state of this arena for serialization.

Returns:: dict: A dictionary containing the state of all competitors in this arena.

get_all_competitors() → List[BaseCompetitor][source]¶

Retrieve a list of all competitors in the arena.

Returns:: list: A list of all competitors.

get_competitor_by_id(id_val: str) → BaseCompetitor | None[source]¶

Retrieve a competitor by their ID.

Args:: id_val (str): The ID of the competitor to retrieve.
Returns:: Optional[BaseCompetitor]: The retrieved competitor, or None if not found.

leaderboard() → List[Dict[str, Any]][source]¶

Generate a leaderboard of all competitors.

Returns:

list: A list of dictionaries containing competitor IDs and their ratings,: sorted by rating in descending order.

matchup(a: Any, b: Any, attributes: Dict[str, Any] | None = None, match_time: datetime | None = None) → None[source]¶

Process a single matchup between two competitors.

This method handles a matchup between two competitors, creating them if they don’t already exist in the arena. It uses the comparison function to determine the outcome and updates the ratings accordingly.

Args:: a: The first competitor or competitor identifier. b: The second competitor or competitor identifier. attributes (dict, optional): Additional attributes to record with this bout. match_time (datetime, optional): The time when the match occurred.

process_history(bouts: List[Tuple[str, str, float | None]], progress_bar: bool = True) → None[source]¶

set_competitor_class_var(name: str, value: Any) → None[source]¶

Set a class variable on the base competitor class.

This method allows for global configuration of all competitors managed by this arena.

Args:: name (str): The name of the class variable to set. value: The value to set for the class variable.

tournament(matchups: List[Tuple[Any, Any]]) → None[source]¶

Run a tournament with the given matchups.

Process multiple matchups between competitors, updating ratings after each matchup.

Args:: matchups (list): A list of (competitor_a, competitor_b) tuples.

validate(validation_bouts: List[Tuple[str, str, float | None]], progress_bar: bool = True) → None[source]¶

Run a validation set through the arena without updating ratings, only recording predictions.

Args:: validation_bouts (list): A list of (competitor_a, competitor_b, outcome) tuples. progress_bar (bool, optional): Whether to display a progress bar.