Calibration Lab
Probability calibration you can run in your browser.
Paste a set of probabilistic predictions, or load a synthetic sample, and the lab scores how calibrated they are (log-loss, Brier, expected calibration error, accuracy), draws the reliability curve, and shows the effect of Platt scaling and isotonic recalibration. Everything runs fully client-side: the math is a small pure TypeScript library, the compute is deterministic and zero-cost, and no data ever leaves your browser. There is no network request.
See the real study →KickCast calibration studyThis lab is a sandbox over synthetic data. The measured companion applies the same isotonic recalibration to the real KickCast model on the 2022 World Cup holdout, with every number traced to the saved model artifact.All three samples are synthetic, illustrative datasets generated with a fixed seed. Not real model output.
p is the predicted probability in [0, 1]; y is the binary outcome (0 or 1). A header row is tolerated; commas or whitespace both work.
The file is read in your browser with FileReader and never uploaded.
Metrics
- Log-loss
- 0.5461
- Brier
- 0.1741
- ECE
- 0.1180
- Accuracy
- 0.7767
- n
- 300
Reliability diagram
Each point is a probability bin plotted at its mean predicted probability (x) against its observed frequency (y). On the dashed diagonal is perfectly calibrated; above it the model under-predicts, below it over-predicts.
Confidence distribution
How the predicted probabilities spread across the bins. A sharp model piles up near 0 and 1; a hedging model bunches around 0.5.
- 0.0–0.1bin 0.0–0.1: n=72 · 24.0%
- 0.1–0.2bin 0.1–0.2: n=40 · 13.3%
- 0.2–0.3bin 0.2–0.3: n=19 · 6.3%
- 0.3–0.4bin 0.3–0.4: n=7 · 2.3%
- 0.4–0.5bin 0.4–0.5: n=0 · 0.0%
- 0.5–0.6bin 0.5–0.6: n=0 · 0.0%
- 0.6–0.7bin 0.6–0.7: n=6 · 2.0%
- 0.7–0.8bin 0.7–0.8: n=26 · 8.7%
- 0.8–0.9bin 0.8–0.9: n=42 · 14.0%
- 0.9–1.0bin 0.9–1.0: n=88 · 29.3%