ā—† SCORE&SIM DOCUMENTATION

Complete guide to the Sports Simulation Intelligence Platform. Learn how to train models, run simulations, and analyze game predictions.

šŸ“š Table of Contents

šŸ“Š Advanced Features

System Overview

The Sports Simulation Intelligence Platform (SCORE&SIM) is a comprehensive system for predicting game outcomes using advanced machine learning models. The system combines:

🧠 Bayesian Models

Traditional statistical models for team strength estimation and QB effects

šŸŽÆ Drive Models

LightGBM-based models that predict drive outcomes (TD, FG, PUNT, etc.)

šŸ”„ State Transition

Real-time simulation of game state changes during live games

šŸ“Š Live Analytics

Real-time win probability calculations and game monitoring

What Makes This System Unique

  • Dual Model Architecture: Combines Bayesian team strength models with ML-based drive prediction
  • Real-Time Updates: Live play-by-play ingestion with automatic drive extraction
  • Playoff Awareness: Special handling for playoff games with urgency features
  • Comprehensive UI: All features accessible through web interface with job tracking

Quick Start Guide

For First-Time Users

  1. Check System Status: Visit the Dashboard to see active models and system health
  2. View Live Games: Go to Live Scoreboard to see current games and predictions
  3. Run a Simulation: Use Launch Simulation to predict an upcoming game
  4. Explore Drive Simulator: Browse historical drives and train models

šŸ’” Pro Tip

Most features work out-of-the-box with pre-trained models. You only need to train new models if you want to use different season ranges or update with latest data.

Key Concepts

1. Drive-Based Simulation

Unlike play-by-play simulation, the system simulates at the drive level. Each drive is predicted to end in one of:

  • TD (Touchdown): 7 points (or 6 + PAT)
  • FG (Field Goal): 3 points
  • PUNT: 0 points, possession changes
  • TURNOVER: 0 points, possession changes
  • DOWNS: 0 points, possession changes
  • SAFETY: 2 points for defense

2. Model Types

Bayesian Models

Traditional models for team strength. Trained via MCMC/ADVI. Used for baseline predictions.

Drive Models (LightGBM)

Machine learning models that predict drive outcomes. More accurate, faster training. RECOMMENDED

State Transition Models

Real-time models for live game simulation. Update win probability as game progresses.

3. Drive Snapshots

Drive snapshots capture the state of a drive at each play. Used for training models that understand:

  • Field position (yardline)
  • Down and distance
  • Score differential
  • Time remaining
  • Playoff urgency (for playoff games)

šŸŽÆ Drive Simulator NEW

The Drive Simulator is a comprehensive suite of tools for working with drive-level data and models.

Available Features

šŸ“‚ Browse Drives

Explore historical drives from the database. Filter by season, team, game, or drive result.

āš™ļø Extract Drives

Extract drives from play-by-play data. Creates Drive records from PlayByPlay or LivePlayByPlay data.

Access: Drive Simulator → Extract Drives

šŸ“ø Extract Snapshots

Create drive snapshots for training. Extracts DriveStateSnapshot records with full drive context.

Access: Drive Simulator → Extract Snapshots

🧠 Train Snapshot Model

Train LightGBM models on drive snapshots. Creates models that predict scoring probability and TD probability.

Access: Drive Simulator → Train Snapshot Model

šŸ”„ Auto Training Loop

Automated training loop that iteratively improves models using AI agent suggestions.

Access: Drive Simulator → Auto Training Loop

ā–¶ļø Run Simulation

Simulate a specific game scenario using trained drive models.

Workflow: Training a Drive Model

  1. Extract Drives: If you haven't already, extract drives from play-by-play data
  2. Extract Snapshots: Create drive snapshots for the seasons you want to train on
  3. Train Model: Use the Train Snapshot Model page to train a LightGBM model
  4. Verify: Check the Drive Models page to see your trained model and metrics

šŸ’” Best Practices

  • Extract snapshots for recent seasons (2020-2024) for best accuracy
  • Use the Auto Training Loop for automated model improvement
  • Check model metrics (AUC, Brier score) before using in production

šŸˆ Live Scoreboard NEW

The Live Scoreboard provides real-time game monitoring with automatic play-by-play ingestion and win probability updates.

Features

šŸ“ŗ Real-Time Updates

Games update automatically every 30 seconds during live games. No manual refresh needed.

šŸ“Š Win Probability

Dynamic win probability calculations that update as the game progresses. Based on current game state.

šŸŽ® Game Detail Pages

Click any game to see detailed information including player stats, drive breakdown, and simulation options.

⚔ Quick Simulation

Run quick simulations (100 sims) directly from game detail pages for instant predictions.

šŸ”„ Intensive Simulation

Run full simulations (10,000+ sims) for comprehensive analysis. Results appear in job queue.

Using the Scoreboard

  1. View Today's Games: The scoreboard shows all games for the current day
  2. Navigate by Week: Use the week selector to view games from other weeks
  3. Click a Game: Opens detailed game page with live updates
  4. Run Simulations: Use Quick Sim for instant results or Intensive Sim for full analysis

āš ļø Live Game Requirements

For live games to update automatically, ensure:

  • Celery Beat is running (scheduled tasks)
  • Celery workers are active
  • Game data is being synced from ESPN API

Training Models

Bayesian Model Training

Train traditional Bayesian models for team strength estimation:

  1. Go to Train Model in the navigation
  2. Enter seasons (comma-separated, e.g., "2016,2017,2018,2019,2020,2021,2022,2023,2024")
  3. Select training profile:
    • Dev: ~5 minutes, quick testing
    • Fast: ~15 minutes, standard use
    • Full: ~2+ hours, comprehensive
    • Overnight: ~4+ hours, highest quality
  4. Choose inference method (Auto recommended)
  5. Click "Start Training"

Drive Model Training

Train LightGBM-based drive models (recommended for most use cases):

  1. Go to Drive Simulator → Train Snapshot Model
  2. Select seasons to train on
  3. Choose model type (LightGBM recommended)
  4. Configure training parameters (or use defaults)
  5. Start training job

Training Profiles Explained

Dev Profile
Time: ~5 min
Samples: 50k
Use: Quick testing
Fast Profile
Time: ~15 min
Samples: 120k
Use: Standard training
Full Profile
Time: ~2+ hours
Samples: All data
Use: Production models
Overnight Profile
Time: ~4+ hours
Samples: All data
Use: Highest quality

Running Simulations

Launch Simulation (Full Control)

For detailed control over simulation parameters:

  1. Go to Launch Simulation
  2. Enter teams (2-3 letter abbreviations, e.g., "KC", "BUF")
  3. Enter QB names (full names, e.g., "Patrick Mahomes")
  4. Set number of simulations (default 10,000)
  5. Optionally select a specific model
  6. Click "Submit Simulation"

Quick Simulation (From Scoreboard)

For instant results on live or upcoming games:

  1. Go to Live Scoreboard
  2. Click on a game
  3. Click "Run Quick Simulation"
  4. Results appear immediately (100 simulations)

Intensive Simulation (From Scoreboard)

For comprehensive analysis:

  1. Go to Live Scoreboard
  2. Click on a game
  3. Click "Run Intensive Simulation"
  4. Job is queued (10,000+ simulations)
  5. Check Active Jobs to view progress

Simulation Parameters

  • Number of Simulations: More = more accurate but slower. 10,000 is recommended for production.
  • Model Selection: Use active model (default) or select a specific trained model
  • Random Seed: For reproducibility. Default 42 is fine for most cases.

šŸ“¦ Drive Models NEW

Drive models are LightGBM-based machine learning models that predict drive outcomes. They're more accurate and faster to train than Bayesian models.

Model Architecture

Drive models use a two-stage approach:

  1. Score Model: Predicts probability of scoring (TD or FG) on a drive
  2. TD Model: Given that scoring occurred, predicts probability of TD vs FG

Features Used

Field Position

Yardline (0-100), distance to goal

Game Situation

Score differential, time remaining, quarter

Drive Context

Plays in drive, yards gained, time elapsed

Playoff Features

Playoff urgency, trailing status, late game

Viewing Models

Go to Data & Models → Drive Models to see:

  • All trained drive models
  • Training metrics (AUC, Brier score, log loss)
  • Model type and training date
  • Seasons used for training

āœ… Model Improvements

Recent improvements include:

  • Enhanced error handling and validation
  • Better field position handling
  • Playoff urgency features
  • Improved calibration (isotonic/sigmoid)
  • Comprehensive evaluation metrics

šŸ”„ State Transition Models NEW

State transition models simulate game progression in real-time, updating win probability as the game state changes.

How It Works

  1. Current State: System captures current game state (score, time, field position)
  2. Drive Simulation: Uses drive models to predict next drive outcome
  3. State Update: Updates game state based on predicted drive result
  4. Win Probability: Calculates win probability from current state
  5. Repeat: Continues until game ends

Use Cases

  • Live Game Monitoring: Real-time win probability updates
  • Game Detail Pages: Shows probability changes over time
  • Simulation Requests: Used for intensive simulations from scoreboard

šŸ’” State Transition vs Drive Models

State transition models use drive models but add game-level logic:

  • Alternates possessions between teams
  • Handles special teams (kickoffs, punts)
  • Manages game clock and quarters
  • Calculates win probability from score

šŸ† Playoff Training Features NEW

The system now includes special handling for playoff games, recognizing that teams play differently in elimination scenarios.

Playoff Urgency Feature

The model includes a playoff_urgency feature that captures:

  • Base Playoff Multiplier: 1.0x for playoff games
  • Trailing Bonus: +0.5x when team is behind
  • 4th Quarter Bonus: +0.3x in final quarter
  • Final 5 Minutes: +0.2x in last 5 minutes
playoff_urgency = is_playoff * (1.0 + 0.5*trailing + 0.3*4th_quarter + 0.2*final_5min)

What This Means

The model learns that playoff teams:

  • Are more aggressive overall (more 4th down attempts)
  • Take more risks when trailing
  • Have different play-calling patterns in late-game situations
  • Show desperation mode in must-win scenarios

Automatic Detection

Playoff games are automatically detected when:

  • Game type is 'POST' (from ESPN API)
  • Play-by-play data includes playoff indicator
  • Drive snapshots include playoff status

āœ… Training Impact

When training models on playoff data, the system automatically:

  • Marks playoff games in database
  • Includes playoff features in training
  • Uses playoff urgency in predictions

šŸ“Š Win Probability NEW

Real-time win probability calculations that update as games progress.

How It's Calculated

  1. Current State: System captures current game state
  2. Simulation: Runs thousands of simulations from current state
  3. Win Count: Counts how many simulations result in home team win
  4. Probability: Win count / total simulations

Where You See It

  • Live Scoreboard: Shows win probability for each game
  • Game Detail Pages: Win probability graph over time
  • Simulation Results: Final win probability from simulations

Understanding Win Probability

50%
Toss-up game
>60%
Strong favorite
<40%
Strong underdog
Updates Live
Changes as game progresses

Running Backtests

Backtests evaluate model accuracy by comparing predictions to actual game outcomes.

When to Run Backtests

  • After training a new model
  • Comparing different models
  • Evaluating model improvements
  • Understanding model strengths/weaknesses

How to Run

  1. Go to Run Backtest
  2. Enter seasons to test (e.g., "2023,2024")
  3. Set simulations per game (default 5,000)
  4. Select model (default: active model)
  5. Click "Submit Backtest"
  6. Wait for completion (can take 30+ minutes)

Understanding Results

  • Brier Score: Lower is better (measures prediction accuracy)
  • Log Loss: Lower is better (measures probability calibration)
  • Calibration Curve: Shows if probabilities match actual frequencies
  • Game-by-Game: Predictions vs actual for each game

Data Sources

The system integrates with multiple data sources for play-by-play data and game information.

Available Sources

ESPN API

Live game data, play-by-play, schedules

NFLSavant

Historical play-by-play data

Managing Data Sources

Go to Data & Models → Data Sources to:

  • View data source status
  • Trigger manual syncs
  • View last sync time
  • Check for errors

šŸ’” Automatic Syncing

Most data sources sync automatically:

  • Live games: Every 30 seconds during games
  • Upcoming games: Daily schedule sync
  • Historical data: On-demand via UI

šŸ“ˆ System Monitoring NEW

The system includes comprehensive monitoring tools to track performance and health.

Dashboard Overview

The Command Center dashboard shows:

  • Active Model: Currently active model and its details
  • Job Queue: Running, queued, and completed jobs
  • Model Registry: Total models and active count
  • Live Games: Number of games currently in progress
  • Recent Activity: Latest simulations, training, and backtests

System Resources Sidebar

The right sidebar displays real-time system metrics:

  • CPU Load: Current CPU usage percentage
  • Memory: RAM usage and available
  • Storage: Disk usage
  • Workers: Active and idle Celery workers
  • Job Queue: Running and queued jobs
  • Live Games: Active game monitoring

Active Jobs Page

Go to Active Jobs to see:

  • All running and queued jobs
  • Job status and progress
  • Real-time logs
  • Ability to cancel jobs

System Logs

Go to System Logs to view:

  • Application logs
  • Celery worker logs
  • Error logs
  • Training logs

Troubleshooting

Common Issues

Training Job Stuck on "Queued"

Problem: Job never starts running

Solutions:

  • Check Celery workers: sudo systemctl status gamesim-celery-train
  • Restart workers if needed: sudo systemctl restart gamesim-celery-train
  • Check logs: sudo journalctl -u gamesim-celery-train -f

Simulation Returns Unexpected Results

Problem: Win probabilities seem wrong

Solutions:

  • Verify QB names match NFL data exactly
  • Check team abbreviations are correct
  • Ensure model is trained on recent seasons
  • Try increasing number of simulations (10,000+ recommended)

Live Games Not Updating

Problem: Live scoreboard not refreshing

Solutions:

  • Check Celery Beat is running: sudo systemctl status gamesim-celery-beat
  • Verify game sync is working: Check Data Sources page
  • Check for errors in System Logs
  • Manually trigger sync from Data Sources page

Model Training Fails

Problem: Training job fails with error

Solutions:

  • Check job logs for specific error message
  • Verify data is available for selected seasons
  • Try smaller season range first
  • Check disk space: df -h
  • Check memory: free -h

Can't Find QB in Model

Problem: QB name not recognized

Solutions:

  • Use full name as it appears in NFL stats (e.g., "Patrick Mahomes" not "P. Mahomes")
  • Check if QB played in training seasons
  • Try "Unknown" if QB is not in training data (model will use average QB effect)

Getting Help

  • Check job logs for detailed error messages
  • Review model metrics in Model Registry or Drive Models
  • Compare with known-good examples
  • Check system resources (CPU, memory, disk) in System Resources sidebar
  • View System Logs for application errors

⚔ Quick Reference

Training Profiles
Dev: ~5 min
Fast: ~15 min
Full: ~2+ hours
Overnight: ~4+ hours
Recommended Settings
Seasons: Last 5-8 years
Simulations: 10,000
Profile: Fast (most users)
Inference: Auto
Team Abbreviations
Use 2-3 letter codes
Examples: KC, BUF, SF
Case doesn't matter
QB Names
Use full names
"Patrick Mahomes"
Not "P. Mahomes"