Help & Documentation

📚 Table of Contents

System Overview

The Sports Simulation Intelligence Platform (SCORE&SIM) is a comprehensive system for predicting game outcomes using advanced machine learning models. The system combines:

🧠 Bayesian Models

Traditional statistical models for team strength estimation and QB effects

🎯 Drive Models

LightGBM-based models that predict drive outcomes (TD, FG, PUNT, etc.)

🔄 State Transition

Real-time simulation of game state changes during live games

📊 Live Analytics

Real-time win probability calculations and game monitoring

What Makes This System Unique

Dual Model Architecture: Combines Bayesian team strength models with ML-based drive prediction
Real-Time Updates: Live play-by-play ingestion with automatic drive extraction
Playoff Awareness: Special handling for playoff games with urgency features
Comprehensive UI: All features accessible through web interface with job tracking

Quick Start Guide

For First-Time Users

Check System Status: Visit the Dashboard to see active models and system health
View Live Games: Go to Live Scoreboard to see current games and predictions
Run a Simulation: Use Launch Simulation to predict an upcoming game
Explore Drive Simulator: Browse historical drives and train models

💡 Pro Tip

Most features work out-of-the-box with pre-trained models. You only need to train new models if you want to use different season ranges or update with latest data.

Key Concepts

1. Drive-Based Simulation

Unlike play-by-play simulation, the system simulates at the drive level. Each drive is predicted to end in one of:

TD (Touchdown): 7 points (or 6 + PAT)
FG (Field Goal): 3 points
PUNT: 0 points, possession changes
TURNOVER: 0 points, possession changes
DOWNS: 0 points, possession changes
SAFETY: 2 points for defense

2. Model Types

Bayesian Models

Traditional models for team strength. Trained via MCMC/ADVI. Used for baseline predictions.

Drive Models (LightGBM)

Machine learning models that predict drive outcomes. More accurate, faster training. RECOMMENDED

State Transition Models

Real-time models for live game simulation. Update win probability as game progresses.

3. Drive Snapshots

Drive snapshots capture the state of a drive at each play. Used for training models that understand:

Field position (yardline)
Down and distance
Score differential
Time remaining
Playoff urgency (for playoff games)

🎯 Drive Simulator NEW

The Drive Simulator is a comprehensive suite of tools for working with drive-level data and models.

Available Features

📂 Browse Drives

Explore historical drives from the database. Filter by season, team, game, or drive result.

⚙️ Extract Drives

Extract drives from play-by-play data. Creates Drive records from PlayByPlay or LivePlayByPlay data.

Access: Drive Simulator → Extract Drives

📸 Extract Snapshots

Create drive snapshots for training. Extracts DriveStateSnapshot records with full drive context.

Access: Drive Simulator → Extract Snapshots

🧠 Train Snapshot Model

Train LightGBM models on drive snapshots. Creates models that predict scoring probability and TD probability.

Access: Drive Simulator → Train Snapshot Model

🔄 Auto Training Loop

Automated training loop that iteratively improves models using AI agent suggestions.

Access: Drive Simulator → Auto Training Loop

▶️ Run Simulation

Simulate a specific game scenario using trained drive models.

Workflow: Training a Drive Model

Extract Drives: If you haven't already, extract drives from play-by-play data
Extract Snapshots: Create drive snapshots for the seasons you want to train on
Train Model: Use the Train Snapshot Model page to train a LightGBM model
Verify: Check the Drive Models page to see your trained model and metrics

💡 Best Practices

Extract snapshots for recent seasons (2020-2024) for best accuracy
Use the Auto Training Loop for automated model improvement
Check model metrics (AUC, Brier score) before using in production

🏈 Live Scoreboard NEW

The Live Scoreboard provides real-time game monitoring with automatic play-by-play ingestion and win probability updates.

Features

📺 Real-Time Updates

Games update automatically every 30 seconds during live games. No manual refresh needed.

📊 Win Probability

Dynamic win probability calculations that update as the game progresses. Based on current game state.

🎮 Game Detail Pages

Click any game to see detailed information including player stats, drive breakdown, and simulation options.

⚡ Quick Simulation

Run quick simulations (100 sims) directly from game detail pages for instant predictions.

🔥 Intensive Simulation

Run full simulations (10,000+ sims) for comprehensive analysis. Results appear in job queue.

Using the Scoreboard

View Today's Games: The scoreboard shows all games for the current day
Navigate by Week: Use the week selector to view games from other weeks
Click a Game: Opens detailed game page with live updates
Run Simulations: Use Quick Sim for instant results or Intensive Sim for full analysis

⚠️ Live Game Requirements

For live games to update automatically, ensure:

Celery Beat is running (scheduled tasks)
Celery workers are active
Game data is being synced from ESPN API

Training Models

Bayesian Model Training

Train traditional Bayesian models for team strength estimation:

Go to Train Model in the navigation
Enter seasons (comma-separated, e.g., "2016,2017,2018,2019,2020,2021,2022,2023,2024")
Select training profile:
- Dev: ~5 minutes, quick testing
- Fast: ~15 minutes, standard use
- Full: ~2+ hours, comprehensive
- Overnight: ~4+ hours, highest quality
Choose inference method (Auto recommended)
Click "Start Training"

Drive Model Training

Train LightGBM-based drive models (recommended for most use cases):

Go to Drive Simulator → Train Snapshot Model
Select seasons to train on
Choose model type (LightGBM recommended)
Configure training parameters (or use defaults)
Start training job

Training Profiles Explained

Dev Profile

Time: ~5 min

Samples: 50k

Use: Quick testing

Fast Profile

Time: ~15 min

Samples: 120k

Use: Standard training

Full Profile

Time: ~2+ hours

Samples: All data

Use: Production models

Overnight Profile

Time: ~4+ hours

Samples: All data

Use: Highest quality

Running Simulations

Launch Simulation (Full Control)

For detailed control over simulation parameters:

Go to Launch Simulation
Enter teams (2-3 letter abbreviations, e.g., "KC", "BUF")
Enter QB names (full names, e.g., "Patrick Mahomes")
Set number of simulations (default 10,000)
Optionally select a specific model
Click "Submit Simulation"

Quick Simulation (From Scoreboard)

For instant results on live or upcoming games:

Go to Live Scoreboard
Click on a game
Click "Run Quick Simulation"
Results appear immediately (100 simulations)

Intensive Simulation (From Scoreboard)

For comprehensive analysis:

Go to Live Scoreboard
Click on a game
Click "Run Intensive Simulation"
Job is queued (10,000+ simulations)
Check Active Jobs to view progress

Simulation Parameters

Number of Simulations: More = more accurate but slower. 10,000 is recommended for production.
Model Selection: Use active model (default) or select a specific trained model
Random Seed: For reproducibility. Default 42 is fine for most cases.

📦 Drive Models NEW

Drive models are LightGBM-based machine learning models that predict drive outcomes. They're more accurate and faster to train than Bayesian models.

Model Architecture

Drive models use a two-stage approach:

Score Model: Predicts probability of scoring (TD or FG) on a drive
TD Model: Given that scoring occurred, predicts probability of TD vs FG

Features Used

Field Position

Yardline (0-100), distance to goal

Game Situation

Score differential, time remaining, quarter

Drive Context

Plays in drive, yards gained, time elapsed

Playoff Features

Playoff urgency, trailing status, late game

Viewing Models

Go to Data & Models → Drive Models to see:

All trained drive models
Training metrics (AUC, Brier score, log loss)
Model type and training date
Seasons used for training

✅ Model Improvements

Recent improvements include:

Enhanced error handling and validation
Better field position handling
Playoff urgency features
Improved calibration (isotonic/sigmoid)
Comprehensive evaluation metrics

🔄 State Transition Models NEW

State transition models simulate game progression in real-time, updating win probability as the game state changes.

How It Works

Current State: System captures current game state (score, time, field position)
Drive Simulation: Uses drive models to predict next drive outcome
State Update: Updates game state based on predicted drive result
Win Probability: Calculates win probability from current state
Repeat: Continues until game ends

Use Cases

Live Game Monitoring: Real-time win probability updates
Game Detail Pages: Shows probability changes over time
Simulation Requests: Used for intensive simulations from scoreboard

💡 State Transition vs Drive Models

State transition models use drive models but add game-level logic:

Alternates possessions between teams
Handles special teams (kickoffs, punts)
Manages game clock and quarters
Calculates win probability from score

🏆 Playoff Training Features NEW

The system now includes special handling for playoff games, recognizing that teams play differently in elimination scenarios.

Playoff Urgency Feature

The model includes a playoff_urgency feature that captures:

Base Playoff Multiplier: 1.0x for playoff games
Trailing Bonus: +0.5x when team is behind
4th Quarter Bonus: +0.3x in final quarter
Final 5 Minutes: +0.2x in last 5 minutes

playoff_urgency = is_playoff * (1.0 + 0.5*trailing + 0.3*4th_quarter + 0.2*final_5min)
    

What This Means

The model learns that playoff teams:

Are more aggressive overall (more 4th down attempts)
Take more risks when trailing
Have different play-calling patterns in late-game situations
Show desperation mode in must-win scenarios

Automatic Detection

Playoff games are automatically detected when:

Game type is 'POST' (from ESPN API)
Play-by-play data includes playoff indicator
Drive snapshots include playoff status

✅ Training Impact

When training models on playoff data, the system automatically:

Marks playoff games in database
Includes playoff features in training
Uses playoff urgency in predictions

📊 Win Probability NEW

Real-time win probability calculations that update as games progress.

How It's Calculated

Current State: System captures current game state
Simulation: Runs thousands of simulations from current state
Win Count: Counts how many simulations result in home team win
Probability: Win count / total simulations

Where You See It

Live Scoreboard: Shows win probability for each game
Game Detail Pages: Win probability graph over time
Simulation Results: Final win probability from simulations

Understanding Win Probability

50%

Toss-up game

>60%

Strong favorite

<40%

Strong underdog

Updates Live

Changes as game progresses

Running Backtests

Backtests evaluate model accuracy by comparing predictions to actual game outcomes.

When to Run Backtests

After training a new model
Comparing different models
Evaluating model improvements
Understanding model strengths/weaknesses

How to Run

Go to Run Backtest
Enter seasons to test (e.g., "2023,2024")
Set simulations per game (default 5,000)
Select model (default: active model)
Click "Submit Backtest"
Wait for completion (can take 30+ minutes)

Understanding Results

Brier Score: Lower is better (measures prediction accuracy)
Log Loss: Lower is better (measures probability calibration)
Calibration Curve: Shows if probabilities match actual frequencies
Game-by-Game: Predictions vs actual for each game

Data Sources

The system integrates with multiple data sources for play-by-play data and game information.

Available Sources

ESPN API

Live game data, play-by-play, schedules

NFLSavant

Historical play-by-play data

Managing Data Sources

Go to Data & Models → Data Sources to:

View data source status
Trigger manual syncs
View last sync time
Check for errors

💡 Automatic Syncing

Most data sources sync automatically:

Live games: Every 30 seconds during games
Upcoming games: Daily schedule sync
Historical data: On-demand via UI

📈 System Monitoring NEW

The system includes comprehensive monitoring tools to track performance and health.

Dashboard Overview

The Command Center dashboard shows:

Active Model: Currently active model and its details
Job Queue: Running, queued, and completed jobs
Model Registry: Total models and active count
Live Games: Number of games currently in progress
Recent Activity: Latest simulations, training, and backtests

System Resources Sidebar

The right sidebar displays real-time system metrics:

CPU Load: Current CPU usage percentage
Memory: RAM usage and available
Storage: Disk usage
Workers: Active and idle Celery workers
Job Queue: Running and queued jobs
Live Games: Active game monitoring

Active Jobs Page

Go to Active Jobs to see:

All running and queued jobs
Job status and progress
Real-time logs
Ability to cancel jobs

System Logs

Go to System Logs to view:

Application logs
Celery worker logs
Error logs
Training logs

Troubleshooting

Common Issues

Training Job Stuck on "Queued"

Problem: Job never starts running

Solutions:

Check Celery workers: sudo systemctl status gamesim-celery-train
Restart workers if needed: sudo systemctl restart gamesim-celery-train
Check logs: sudo journalctl -u gamesim-celery-train -f

Simulation Returns Unexpected Results

Problem: Win probabilities seem wrong

Solutions:

Verify QB names match NFL data exactly
Check team abbreviations are correct
Ensure model is trained on recent seasons
Try increasing number of simulations (10,000+ recommended)

Live Games Not Updating

Problem: Live scoreboard not refreshing

Solutions:

Check Celery Beat is running: sudo systemctl status gamesim-celery-beat
Verify game sync is working: Check Data Sources page
Check for errors in System Logs
Manually trigger sync from Data Sources page

Model Training Fails

Problem: Training job fails with error

Solutions:

Check job logs for specific error message
Verify data is available for selected seasons
Try smaller season range first
Check disk space: df -h
Check memory: free -h

Can't Find QB in Model

Problem: QB name not recognized

Solutions:

Use full name as it appears in NFL stats (e.g., "Patrick Mahomes" not "P. Mahomes")
Check if QB played in training seasons
Try "Unknown" if QB is not in training data (model will use average QB effect)

Getting Help

Check job logs for detailed error messages
Review model metrics in Model Registry or Drive Models
Compare with known-good examples
Check system resources (CPU, memory, disk) in System Resources sidebar
View System Logs for application errors

⚡ Quick Reference

Training Profiles

Dev: ~5 min

Fast: ~15 min

Full: ~2+ hours

Overnight: ~4+ hours

Recommended Settings

Seasons: Last 5-8 years

Simulations: 10,000

Profile: Fast (most users)

Inference: Auto

Team Abbreviations

Use 2-3 letter codes

Examples: KC, BUF, SF

Case doesn't matter

QB Names

Use full names

"Patrick Mahomes"

Not "P. Mahomes"

◆ SCORE&SIM DOCUMENTATION

📚 Table of Contents

🚀 Getting Started

🎯 Core Features

📊 Advanced Features

🛠️ Operations

System Overview

🧠 Bayesian Models

🎯 Drive Models

🔄 State Transition

📊 Live Analytics

What Makes This System Unique

Quick Start Guide

For First-Time Users

💡 Pro Tip

Key Concepts

1. Drive-Based Simulation

2. Model Types

Bayesian Models

Drive Models (LightGBM)

State Transition Models

3. Drive Snapshots

🎯 Drive Simulator NEW

Available Features

📂 Browse Drives

⚙️ Extract Drives

📸 Extract Snapshots

🧠 Train Snapshot Model

🔄 Auto Training Loop

▶️ Run Simulation

Workflow: Training a Drive Model

💡 Best Practices

🏈 Live Scoreboard NEW

Features

📺 Real-Time Updates

📊 Win Probability

🎮 Game Detail Pages

⚡ Quick Simulation

🔥 Intensive Simulation

Using the Scoreboard

⚠️ Live Game Requirements

Training Models

Bayesian Model Training

Drive Model Training

Training Profiles Explained

Running Simulations

Launch Simulation (Full Control)

Quick Simulation (From Scoreboard)

Intensive Simulation (From Scoreboard)

Simulation Parameters

📦 Drive Models NEW

Model Architecture

Features Used

Field Position

Game Situation

Drive Context

Playoff Features

Viewing Models

✅ Model Improvements

🔄 State Transition Models NEW

How It Works

Use Cases

💡 State Transition vs Drive Models

🏆 Playoff Training Features NEW

Playoff Urgency Feature

What This Means

Automatic Detection

✅ Training Impact

📊 Win Probability NEW

How It's Calculated

Where You See It

Understanding Win Probability

Running Backtests

When to Run Backtests

How to Run

Understanding Results

Data Sources

Available Sources

ESPN API

NFLSavant