Validation Scoreboard

Every metric, tested against a baseline.
Here's what survived.

We ran ~90 across 3 discovery runs candidate metrics through a rigorous incremental information coefficient gate. 4 passed. The failures are shown too — that's what honest validation looks like.

~90 across 3 discovery runs

candidates tested

passed the gate

shown failing

The Gate

What a metric must prove

|IC| ≥ 0.05

incremental information coefficient

p < 0.05

statistical significance

Method: partial Spearman Information Coefficient (incremental IC)

Controlling for the naive baseline, the metric must add incremental_ic >= 0.05 with p < 0.05. Incremental IC = partial Spearman rank-correlation of the signal vs forward target, residualised on the naive baseline.

Dataset: ~10y daily prices, 8-9 tickers (SPY, QQQ, AAPL, MSFT, NVDA, AMZN, GOOGL, META + others), ~22,793 observations

Gate as of 2026-06-17. Data fetched live from the ADW validation API at build time (2026-06-21).

Full Results

11 metrics on the board

Passed rows are shown first. Failed rows follow — we publish them because hiding negative results would defeat the purpose of having a gate.

Passed the gate (4)

Metric	Target	Baseline IC	Incr. IC	p-value	Result
Entropy-Weighted CUSUM Volatility Signal (EWC) ADW-101 ADW-101 ↗	forward 5-day realized volatility vs. trailing-20d realized volatility	0.691	0.125	< 1e-10	PASS
Tail Probability Shift (TPS) ADW-102 ADW-102 ↗	forward 5-day realized volatility vs. trailing-20d realized volatility	0.691	-0.095(flip)	< 1e-10	PASS
Local Tail Variance Ratio (LTVR) ADW-105 ADW-105 ↗	forward 5-day realized volatility vs. trailing-20d realized volatility	0.691	0.058	7.20e-9	PASS
Tail Mean Difference (TMD) ADW-106 ADW-106 ↗	forward 5-day return vs. 20-day momentum	—	0.054	1.00e-7	PASS

ADW-101: Strongest keeper. EWC = CUSUM_max / (SampleEntropy + 1e-6). Entropy weighting captures change STRUCTURE beyond vol level. Incremental IC reported as +0.1254 in the backtest table; the Shortlist rounds to +0.125. p reported as 0.00e+00 (machine zero given n=13,332).

ADW-102: Sign is NEGATIVE (flip for use): rising tail-frequency shift precedes LOWER forward realized vol — likely burst-then-exhaust effect. Real edge; confirm sign out-of-sample. raw_ic not separately reported in sources.

ADW-105: Measures volatility ACCELERATION (second-order), not level. Passes incremental gate because it captures change in dispersion, not just mean vol. raw_ic not separately reported in sources.

ADW-106: The ONLY validated RETURN predictor found across all runs. Tail-dispersion/asymmetry spread predicting direction is unusual (possible low-vol-premium effect). baseline_ic for 20-day momentum baseline not separately reported in sources for this target.

Did not pass (7)

Why show these? Publishing rejections is a stronger trust signal than cherry-picking winners. A metric with a great raw IC can still fail because it adds no incremental information over the naive baseline.

Metric	Target	Baseline IC	Incr. IC	p-value	Result
Geometric Autocorrelation (GAC) CAND-GAC	forward 5-day realized volatility vs. trailing-20d realized volatility	0.691	0.023	0.0073	FAIL
H-KER (Hurst-Kernel Entropy Ratio) CAND-H-KER	forward 5-day return vs. 20-day momentum	—	—	0.3040	FAIL
OUV (Ornstein-Uhlenbeck Volatility) CAND-OUV	forward 5-day return vs. 20-day momentum	—	—	0.2310	FAIL
EHA (Entropy-Hurst Asymmetry) CAND-EHA	forward 5-day return vs. 20-day momentum	—	—	0.7100	FAIL
Tail Concentration Ratio CAND-R1-TCR	forward 5-day realized volatility vs. trailing-20d realized volatility	0.691	0.019	0.0610	FAIL
Median Absolute Deviation Ratio CAND-R2-MADR	forward 5-day realized volatility vs. trailing-20d realized volatility	0.691	0.013	0.2100	FAIL
Fractal Hurst CAND-R2-FH	forward 5-day realized volatility vs. trailing-20d realized volatility	0.691	0.004	0.7200	FAIL

CAND-GAC: GAC FAILED the gate: incremental IC=+0.023 < 0.05 threshold. Although statistically significant (p=0.007), the effect size is too small — the metric is largely trailing-vol in disguise. raw_ic passes naively but the incremental gate exposes it.

CAND-H-KER: Failed on both criteria: p=0.304 >> 0.05 (not significant) and IC magnitude near zero. Incremental IC not computed because the raw IC was not significant. baseline_ic for return target not reported in sources.

CAND-OUV: Failed: p=0.231 >> 0.05. Incremental IC not computed. baseline_ic for return target not reported in sources.

CAND-EHA: Failed: p=0.710 >> 0.05, IC near zero. Directional metrics as a class did not predict returns in this test set.

CAND-R1-TCR: Failed both criteria: IC=0.019 < 0.05 and p=0.061 > 0.05. Appeared across multiple runs (Runs 1, 2, 3, 4) with consistent failure — a persistent near-miss.

CAND-R2-MADR: Failed on both criteria across multiple runs. Representative of the median/robust-vol family that consistently underperforms the gate.

CAND-R2-FH: Failed decisively: p=0.72, IC near zero. The Hurst/long-memory family does not add incremental signal over trailing vol in this dataset.

Trust Differentiator

What this means for you

Honest results

Failures included

Any vendor can cherry-pick winners. We publish every candidate we ran, including the ones that looked good on raw IC but failed the incremental gate.

Incremental gate

Not just "significant"

We control for the naive baseline. A metric that correlates with a simple moving average adds no value — it has to add incremental signal beyond what you already have.

Live scoreboard

API-backed

This page is rebuilt from the live validation API. When new candidates are tested or results are updated, the scoreboard reflects it automatically.

Next step

See the full methodology

Understand how we build, validate, and govern intelligence products — and how the validated metrics become agent-callable objects.

Read the methodology Start free

Every metric, tested against a baseline. Here's what survived.