ML / Data ScienceLive2026

Calibration Lab

client-side probability-calibration explorer

A deployed, fully client-side calibration explorer: score log-loss, Brier and ECE, read the reliability diagram, and fit Platt or isotonic recalibration in the browser with no network request. It runs on synthetic illustrative data; the measured results live in the KickCast study.

ML / Data Science · Interactive previewCalibration Labclient-side probability-calibration explorer

Fully client-side and deterministic. Runs on synthetic illustrative classifiers; the real KickCast calibration study is at /work/kickcast-calibration.

Open live app ↗

Calibration Lab is the interactive companion to the KickCast Calibration Study: where the study reports measured results on a real model, this lets a visitor build the same intuition by hand. Paste a set of "p,y" rows (predicted probability, binary outcome), load one of the labelled synthetic samples, or open a local CSV, and the lab scores log-loss, Brier, expected calibration error and accuracy, draws the reliability diagram and confidence histogram, and lets you fit Platt scaling or isotonic regression (PAV) and watch the curve move toward the diagonal.

What is real here is the engineering, not the data. The math is a small pure TypeScript library (@/lib/calibration) with no React, no DOM and no I/O, unit-tested as its own clean module; the three built-in samples are synthetic, illustrative datasets generated from a fixed seed and tagged "Synthetic" in the UI, so they are never mistaken for model output. Everything runs in the browser: the compute is deterministic and zero-cost, and no data ever leaves the page. A local file is read with FileReader and never uploaded, and the tool issues no network request at all.

The honest framing is stated on the page and in this card: the lab fits and applies recalibration in-sample, which shows the mechanism but is not an estimate of generalization. For that, the measured companion fits the calibrator on a held-out validation split and reports the effect on the untouched 2022 World Cup holdout.

TypeScript
Next.js
React
Tailwind CSS
Platt scaling
Isotonic regression (PAV)

Metrics: log-loss · Brier · ECE
Recalibration: Platt + isotonic (PAV)
Data: Synthetic illustrative samples
Compute: In-browser · zero network

What I'd improve

The biggest honesty gap is in-sample recalibration: the lab fits and applies the calibrator to the same rows, which demonstrates the mechanism but overstates the gain a held-out split would show. Next: add an in-app train/test split toggle so the reliability curve and the metric deltas are reported on unseen rows, support multiclass inputs (the math is currently binary one-vs-rest), and add temperature scaling alongside Platt and isotonic so the three methods can be compared head to head, mirroring the open work in the measured study.

Open live

Want something like this? Get in touch →

Calibration Lab

client-side probability-calibration explorer

What I'd improve