Skip to content

Competitive AutoML with model tournaments, feature importance, prediction intervals, and what-if analysis in Streamlit.

Notifications You must be signed in to change notification settings

lancejepsen/ModelArena-AutoML

Repository files navigation

ModelArena

Competitive modeling with uncertainty you can trust

By Lance Jepsen & ChatGPT

Python Streamlit License AutoML Uncertainty

👤 Author

Lance Jepsen
Data Science · Machine Learning · AutoML Systems

🔗 LinkedIn: https://www.linkedin.com/in/lance-jepsen/
🎥 Project walkthrough: https://www.youtube.com/watch?v=vGuiYdUlMI8

ModelArena (modelarena-automl) is a competitive AutoML system with conformal prediction and uncertainty-aware modeling.

ModelArena is a professional AutoML playground where machine learning models compete, uncertainty is quantified, and predictions become explainable.

ModelArena helps you compare models, understand what drives predictions, and make decisions with uncertainty-aware outputs (locally adaptive conformal prediction intervals for regression).

It’s built to be:

  • Beginner-friendly (learn ML by doing)
  • Professional-grade (tournament leaderboard + diagnostics + what-if)
  • Practical (works on your own CSVs in minutes)

🚀 Quick Start (Run with Streamlit)

1) Prerequisites

  • Python 3.10+ recommended
  • Windows / macOS / Linux

2) Install

Open a terminal in the project folder:

# (optional) create & activate a virtual environment
python -m venv .venv

# Windows (PowerShell)
.venv\Scripts\Activate.ps1

# Windows (cmd)
.venv\Scripts\activate.bat

# macOS/Linux
source .venv/bin/activate

# install dependencies
python -m pip install --upgrade pip
pip install -r requirements.txt

3) Run the app

streamlit run app.py

Streamlit will print a local URL (usually http://localhost:8501). Open it in your browser.


📂 Included Sample Datasets (Start Here)

1️⃣ sample_rent_regression.csv (Regression)

Goal: Predict monthly rent (a number)

Target column

target_monthly_rent_usd

Features

  • unit_size_sqft
  • bedrooms
  • bathrooms
  • year_built
  • distance_to_downtown_miles
  • crime_index
  • school_rating

2️⃣ sample_tenant_renewal_classification.csv (Classification)

Goal: Predict tenant renewal (yes/no)

Target column

renewed_lease   (0 = No, 1 = Yes)

Features

  • monthly_rent
  • income_usd
  • tenure_months
  • late_payments
  • maintenance_requests
  • unit_size_sqft
  • satisfaction_score

🧠 Educational Walkthrough (Learn ML by Using ModelArena)

Step 1 — Load a CSV

Upload one of the sample CSVs (or your own).

Tip: The target is the column you want to predict.

  • Regression target example: target_monthly_rent_usd
  • Classification target example: renewed_lease

Step 2 — Choose Task

ModelArena works for both:

Task Predicts Examples
Regression a number rent, price, time, cost
Classification a category renewal, churn, fraud

Step 3 — Choose Metric (This controls the “winner”)

Regression

  • RMSE: penalizes large errors more
  • MAE: average absolute error, easy to interpret

Classification

  • Accuracy: % correct (simple baseline)
  • F1: better when classes are imbalanced
  • ROC-AUC: ranking quality (requires probabilities)

Important: If you choose ROC-AUC, ModelArena will only use models that can produce probabilities.


Step 4 — Run Tournament

Click Run Tournament.

ModelArena will:

  1. train multiple models
  2. tune them (if tuning is enabled)
  3. rank them on your chosen metric
  4. crown a winner

You’ll see a leaderboard with the scores.


🔍 Diagnostics (How to Interpret Results)

✅ Most Predictive Features

ModelArena ranks columns by permutation importance (model-agnostic):

  • Higher = more predictive of the outcome
  • Works for regression and classification

This is the “Which columns matter most?” chart.


📈 Regression: Predicted vs Actual

  • Points: predictions vs true values
  • Diagonal line: perfect predictions
  • Uncertainty band: prediction interval summary (adaptive conformal PI)

🧩 Classification: Confusion Matrix

Shows:

  • True positives / negatives
  • False positives / negatives

This helps you see what kind of mistakes the model is making.


📐 Prediction Intervals (Regression)

ModelArena provides locally adaptive conformal prediction intervals:

  • Distribution-free (doesn’t assume normality)
  • Works with any winning model
  • Interval width adjusts by row (heteroskedastic)

Instead of only:

Predicted rent = $2,100

You also get:

95% interval ≈ [$1,920, $2,280]


🔮 Quick Prediction

After the tournament:

  • Enter feature values
  • Get an instant prediction
  • See uncertainty (regression) or class outcome (classification)

🔁 What-If / Counterfactual Simulator

Move sliders to answer:

  • “What if unit size increases?”
  • “What if income drops?”
  • “What if crime index improves?”

Predictions update live to make ML intuitive.


🧩 Supported Models (current)

  • Linear Regression / Logistic Regression
  • Random Forest
  • ExtraTrees
  • HistGradientBoosting
  • XGBoost
  • LightGBM
  • CatBoost

🛠 Troubleshooting

Streamlit command not found

If streamlit run app.py fails, reinstall:

pip install -r requirements.txt

Switching between regression & classification

If you switch datasets and see odd UI behavior, refresh the page to clear Streamlit state (or use the app’s reset button if present).


👤 Authors

Lance Jepsen – product vision, architecture, ML direction
ChatGPT – co-developer, ML engineering, education & documentation


📜 License

MIT License — free to use, modify, and learn from.

About

Competitive AutoML with model tournaments, feature importance, prediction intervals, and what-if analysis in Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages