ā SCORE&SIM DOCUMENTATION
Complete guide to the Sports Simulation Intelligence Platform. Learn how to train models, run simulations, and analyze game predictions.
š Table of Contents
š Getting Started
šÆ Core Features
š Advanced Features
- Drive Models NEW
- State Transition NEW
- Playoff Training NEW
- Win Probability NEW
š ļø Operations
System Overview
The Sports Simulation Intelligence Platform (SCORE&SIM) is a comprehensive system for predicting game outcomes using advanced machine learning models. The system combines:
š§ Bayesian Models
Traditional statistical models for team strength estimation and QB effects
šÆ Drive Models
LightGBM-based models that predict drive outcomes (TD, FG, PUNT, etc.)
š State Transition
Real-time simulation of game state changes during live games
š Live Analytics
Real-time win probability calculations and game monitoring
What Makes This System Unique
- Dual Model Architecture: Combines Bayesian team strength models with ML-based drive prediction
- Real-Time Updates: Live play-by-play ingestion with automatic drive extraction
- Playoff Awareness: Special handling for playoff games with urgency features
- Comprehensive UI: All features accessible through web interface with job tracking
Quick Start Guide
For First-Time Users
- Check System Status: Visit the Dashboard to see active models and system health
- View Live Games: Go to Live Scoreboard to see current games and predictions
- Run a Simulation: Use Launch Simulation to predict an upcoming game
- Explore Drive Simulator: Browse historical drives and train models
š” Pro Tip
Most features work out-of-the-box with pre-trained models. You only need to train new models if you want to use different season ranges or update with latest data.
Key Concepts
1. Drive-Based Simulation
Unlike play-by-play simulation, the system simulates at the drive level. Each drive is predicted to end in one of:
- TD (Touchdown): 7 points (or 6 + PAT)
- FG (Field Goal): 3 points
- PUNT: 0 points, possession changes
- TURNOVER: 0 points, possession changes
- DOWNS: 0 points, possession changes
- SAFETY: 2 points for defense
2. Model Types
Bayesian Models
Traditional models for team strength. Trained via MCMC/ADVI. Used for baseline predictions.
Drive Models (LightGBM)
Machine learning models that predict drive outcomes. More accurate, faster training. RECOMMENDED
State Transition Models
Real-time models for live game simulation. Update win probability as game progresses.
3. Drive Snapshots
Drive snapshots capture the state of a drive at each play. Used for training models that understand:
- Field position (yardline)
- Down and distance
- Score differential
- Time remaining
- Playoff urgency (for playoff games)
šÆ Drive Simulator NEW
The Drive Simulator is a comprehensive suite of tools for working with drive-level data and models.
Available Features
š Browse Drives
Explore historical drives from the database. Filter by season, team, game, or drive result.
āļø Extract Drives
Extract drives from play-by-play data. Creates Drive records from PlayByPlay or LivePlayByPlay data.
šø Extract Snapshots
Create drive snapshots for training. Extracts DriveStateSnapshot records with full drive context.
š§ Train Snapshot Model
Train LightGBM models on drive snapshots. Creates models that predict scoring probability and TD probability.
š Auto Training Loop
Automated training loop that iteratively improves models using AI agent suggestions.
ā¶ļø Run Simulation
Simulate a specific game scenario using trained drive models.
Workflow: Training a Drive Model
- Extract Drives: If you haven't already, extract drives from play-by-play data
- Extract Snapshots: Create drive snapshots for the seasons you want to train on
- Train Model: Use the Train Snapshot Model page to train a LightGBM model
- Verify: Check the Drive Models page to see your trained model and metrics
š” Best Practices
- Extract snapshots for recent seasons (2020-2024) for best accuracy
- Use the Auto Training Loop for automated model improvement
- Check model metrics (AUC, Brier score) before using in production
š Live Scoreboard NEW
The Live Scoreboard provides real-time game monitoring with automatic play-by-play ingestion and win probability updates.
Features
šŗ Real-Time Updates
Games update automatically every 30 seconds during live games. No manual refresh needed.
š Win Probability
Dynamic win probability calculations that update as the game progresses. Based on current game state.
š® Game Detail Pages
Click any game to see detailed information including player stats, drive breakdown, and simulation options.
ā” Quick Simulation
Run quick simulations (100 sims) directly from game detail pages for instant predictions.
š„ Intensive Simulation
Run full simulations (10,000+ sims) for comprehensive analysis. Results appear in job queue.
Using the Scoreboard
- View Today's Games: The scoreboard shows all games for the current day
- Navigate by Week: Use the week selector to view games from other weeks
- Click a Game: Opens detailed game page with live updates
- Run Simulations: Use Quick Sim for instant results or Intensive Sim for full analysis
ā ļø Live Game Requirements
For live games to update automatically, ensure:
- Celery Beat is running (scheduled tasks)
- Celery workers are active
- Game data is being synced from ESPN API
Training Models
Bayesian Model Training
Train traditional Bayesian models for team strength estimation:
- Go to Train Model in the navigation
- Enter seasons (comma-separated, e.g., "2016,2017,2018,2019,2020,2021,2022,2023,2024")
- Select training profile:
- Dev: ~5 minutes, quick testing
- Fast: ~15 minutes, standard use
- Full: ~2+ hours, comprehensive
- Overnight: ~4+ hours, highest quality
- Choose inference method (Auto recommended)
- Click "Start Training"
Drive Model Training
Train LightGBM-based drive models (recommended for most use cases):
- Go to Drive Simulator ā Train Snapshot Model
- Select seasons to train on
- Choose model type (LightGBM recommended)
- Configure training parameters (or use defaults)
- Start training job
Training Profiles Explained
Running Simulations
Launch Simulation (Full Control)
For detailed control over simulation parameters:
- Go to Launch Simulation
- Enter teams (2-3 letter abbreviations, e.g., "KC", "BUF")
- Enter QB names (full names, e.g., "Patrick Mahomes")
- Set number of simulations (default 10,000)
- Optionally select a specific model
- Click "Submit Simulation"
Quick Simulation (From Scoreboard)
For instant results on live or upcoming games:
- Go to Live Scoreboard
- Click on a game
- Click "Run Quick Simulation"
- Results appear immediately (100 simulations)
Intensive Simulation (From Scoreboard)
For comprehensive analysis:
- Go to Live Scoreboard
- Click on a game
- Click "Run Intensive Simulation"
- Job is queued (10,000+ simulations)
- Check Active Jobs to view progress
Simulation Parameters
- Number of Simulations: More = more accurate but slower. 10,000 is recommended for production.
- Model Selection: Use active model (default) or select a specific trained model
- Random Seed: For reproducibility. Default 42 is fine for most cases.
š¦ Drive Models NEW
Drive models are LightGBM-based machine learning models that predict drive outcomes. They're more accurate and faster to train than Bayesian models.
Model Architecture
Drive models use a two-stage approach:
- Score Model: Predicts probability of scoring (TD or FG) on a drive
- TD Model: Given that scoring occurred, predicts probability of TD vs FG
Features Used
Field Position
Yardline (0-100), distance to goal
Game Situation
Score differential, time remaining, quarter
Drive Context
Plays in drive, yards gained, time elapsed
Playoff Features
Playoff urgency, trailing status, late game
Viewing Models
Go to Data & Models ā Drive Models to see:
- All trained drive models
- Training metrics (AUC, Brier score, log loss)
- Model type and training date
- Seasons used for training
ā Model Improvements
Recent improvements include:
- Enhanced error handling and validation
- Better field position handling
- Playoff urgency features
- Improved calibration (isotonic/sigmoid)
- Comprehensive evaluation metrics
š State Transition Models NEW
State transition models simulate game progression in real-time, updating win probability as the game state changes.
How It Works
- Current State: System captures current game state (score, time, field position)
- Drive Simulation: Uses drive models to predict next drive outcome
- State Update: Updates game state based on predicted drive result
- Win Probability: Calculates win probability from current state
- Repeat: Continues until game ends
Use Cases
- Live Game Monitoring: Real-time win probability updates
- Game Detail Pages: Shows probability changes over time
- Simulation Requests: Used for intensive simulations from scoreboard
š” State Transition vs Drive Models
State transition models use drive models but add game-level logic:
- Alternates possessions between teams
- Handles special teams (kickoffs, punts)
- Manages game clock and quarters
- Calculates win probability from score
š Playoff Training Features NEW
The system now includes special handling for playoff games, recognizing that teams play differently in elimination scenarios.
Playoff Urgency Feature
The model includes a playoff_urgency feature that captures:
- Base Playoff Multiplier: 1.0x for playoff games
- Trailing Bonus: +0.5x when team is behind
- 4th Quarter Bonus: +0.3x in final quarter
- Final 5 Minutes: +0.2x in last 5 minutes
What This Means
The model learns that playoff teams:
- Are more aggressive overall (more 4th down attempts)
- Take more risks when trailing
- Have different play-calling patterns in late-game situations
- Show desperation mode in must-win scenarios
Automatic Detection
Playoff games are automatically detected when:
- Game type is 'POST' (from ESPN API)
- Play-by-play data includes playoff indicator
- Drive snapshots include playoff status
ā Training Impact
When training models on playoff data, the system automatically:
- Marks playoff games in database
- Includes playoff features in training
- Uses playoff urgency in predictions
š Win Probability NEW
Real-time win probability calculations that update as games progress.
How It's Calculated
- Current State: System captures current game state
- Simulation: Runs thousands of simulations from current state
- Win Count: Counts how many simulations result in home team win
- Probability: Win count / total simulations
Where You See It
- Live Scoreboard: Shows win probability for each game
- Game Detail Pages: Win probability graph over time
- Simulation Results: Final win probability from simulations
Understanding Win Probability
Running Backtests
Backtests evaluate model accuracy by comparing predictions to actual game outcomes.
When to Run Backtests
- After training a new model
- Comparing different models
- Evaluating model improvements
- Understanding model strengths/weaknesses
How to Run
- Go to Run Backtest
- Enter seasons to test (e.g., "2023,2024")
- Set simulations per game (default 5,000)
- Select model (default: active model)
- Click "Submit Backtest"
- Wait for completion (can take 30+ minutes)
Understanding Results
- Brier Score: Lower is better (measures prediction accuracy)
- Log Loss: Lower is better (measures probability calibration)
- Calibration Curve: Shows if probabilities match actual frequencies
- Game-by-Game: Predictions vs actual for each game
Data Sources
The system integrates with multiple data sources for play-by-play data and game information.
Available Sources
ESPN API
Live game data, play-by-play, schedules
NFLSavant
Historical play-by-play data
Managing Data Sources
Go to Data & Models ā Data Sources to:
- View data source status
- Trigger manual syncs
- View last sync time
- Check for errors
š” Automatic Syncing
Most data sources sync automatically:
- Live games: Every 30 seconds during games
- Upcoming games: Daily schedule sync
- Historical data: On-demand via UI
š System Monitoring NEW
The system includes comprehensive monitoring tools to track performance and health.
Dashboard Overview
The Command Center dashboard shows:
- Active Model: Currently active model and its details
- Job Queue: Running, queued, and completed jobs
- Model Registry: Total models and active count
- Live Games: Number of games currently in progress
- Recent Activity: Latest simulations, training, and backtests
System Resources Sidebar
The right sidebar displays real-time system metrics:
- CPU Load: Current CPU usage percentage
- Memory: RAM usage and available
- Storage: Disk usage
- Workers: Active and idle Celery workers
- Job Queue: Running and queued jobs
- Live Games: Active game monitoring
Active Jobs Page
Go to Active Jobs to see:
- All running and queued jobs
- Job status and progress
- Real-time logs
- Ability to cancel jobs
System Logs
Go to System Logs to view:
- Application logs
- Celery worker logs
- Error logs
- Training logs
Troubleshooting
Common Issues
Training Job Stuck on "Queued"
Problem: Job never starts running
Solutions:
- Check Celery workers:
sudo systemctl status gamesim-celery-train - Restart workers if needed:
sudo systemctl restart gamesim-celery-train - Check logs:
sudo journalctl -u gamesim-celery-train -f
Simulation Returns Unexpected Results
Problem: Win probabilities seem wrong
Solutions:
- Verify QB names match NFL data exactly
- Check team abbreviations are correct
- Ensure model is trained on recent seasons
- Try increasing number of simulations (10,000+ recommended)
Live Games Not Updating
Problem: Live scoreboard not refreshing
Solutions:
- Check Celery Beat is running:
sudo systemctl status gamesim-celery-beat - Verify game sync is working: Check Data Sources page
- Check for errors in System Logs
- Manually trigger sync from Data Sources page
Model Training Fails
Problem: Training job fails with error
Solutions:
- Check job logs for specific error message
- Verify data is available for selected seasons
- Try smaller season range first
- Check disk space:
df -h - Check memory:
free -h
Can't Find QB in Model
Problem: QB name not recognized
Solutions:
- Use full name as it appears in NFL stats (e.g., "Patrick Mahomes" not "P. Mahomes")
- Check if QB played in training seasons
- Try "Unknown" if QB is not in training data (model will use average QB effect)
Getting Help
- Check job logs for detailed error messages
- Review model metrics in Model Registry or Drive Models
- Compare with known-good examples
- Check system resources (CPU, memory, disk) in System Resources sidebar
- View System Logs for application errors