Skip to content
AlpineDataWorks.AI
Validation Scoreboard

Every metric, tested against a baseline. Here's what survived.

We ran ~90 across 3 discovery runs candidate metrics through a rigorous incremental information coefficient gate. 4 passed. The failures are shown too — that's what honest validation looks like.

~90 across 3 discovery runs
candidates tested
4
passed the gate
7
shown failing
The Gate

What a metric must prove

|IC| ≥ 0.05
incremental information coefficient
p < 0.05
statistical significance

Method: partial Spearman Information Coefficient (incremental IC)

Controlling for the naive baseline, the metric must add incremental_ic >= 0.05 with p < 0.05. Incremental IC = partial Spearman rank-correlation of the signal vs forward target, residualised on the naive baseline.

Dataset: ~10y daily prices, 8-9 tickers (SPY, QQQ, AAPL, MSFT, NVDA, AMZN, GOOGL, META + others), ~22,793 observations

Gate as of 2026-06-17. Data fetched live from the ADW validation API at build time (2026-06-21).

Full Results

11 metrics on the board

Passed rows are shown first. Failed rows follow — we publish them because hiding negative results would defeat the purpose of having a gate.

Passed the gate (4)

Metric Target Baseline IC Incr. IC p-value Result
Entropy-Weighted CUSUM Volatility Signal (EWC)
ADW-101
forward 5-day realized volatility
vs. trailing-20d realized volatility
0.691 0.125 < 1e-10 PASS
Tail Probability Shift (TPS)
ADW-102
forward 5-day realized volatility
vs. trailing-20d realized volatility
0.691 -0.095(flip) < 1e-10 PASS
Local Tail Variance Ratio (LTVR)
ADW-105
forward 5-day realized volatility
vs. trailing-20d realized volatility
0.691 0.058 7.20e-9 PASS
Tail Mean Difference (TMD)
ADW-106
forward 5-day return
vs. 20-day momentum
0.054 1.00e-7 PASS
ADW-101: Strongest keeper. EWC = CUSUM_max / (SampleEntropy + 1e-6). Entropy weighting captures change STRUCTURE beyond vol level. Incremental IC reported as +0.1254 in the backtest table; the Shortlist rounds to +0.125. p reported as 0.00e+00 (machine zero given n=13,332).
ADW-102: Sign is NEGATIVE (flip for use): rising tail-frequency shift precedes LOWER forward realized vol — likely burst-then-exhaust effect. Real edge; confirm sign out-of-sample. raw_ic not separately reported in sources.
ADW-105: Measures volatility ACCELERATION (second-order), not level. Passes incremental gate because it captures change in dispersion, not just mean vol. raw_ic not separately reported in sources.
ADW-106: The ONLY validated RETURN predictor found across all runs. Tail-dispersion/asymmetry spread predicting direction is unusual (possible low-vol-premium effect). baseline_ic for 20-day momentum baseline not separately reported in sources for this target.

Did not pass (7)

Why show these? Publishing rejections is a stronger trust signal than cherry-picking winners. A metric with a great raw IC can still fail because it adds no incremental information over the naive baseline.
Metric Target Baseline IC Incr. IC p-value Result
Geometric Autocorrelation (GAC)
CAND-GAC
forward 5-day realized volatility
vs. trailing-20d realized volatility
0.691 0.023 0.0073 FAIL
H-KER (Hurst-Kernel Entropy Ratio)
CAND-H-KER
forward 5-day return
vs. 20-day momentum
0.3040 FAIL
OUV (Ornstein-Uhlenbeck Volatility)
CAND-OUV
forward 5-day return
vs. 20-day momentum
0.2310 FAIL
EHA (Entropy-Hurst Asymmetry)
CAND-EHA
forward 5-day return
vs. 20-day momentum
0.7100 FAIL
Tail Concentration Ratio
CAND-R1-TCR
forward 5-day realized volatility
vs. trailing-20d realized volatility
0.691 0.019 0.0610 FAIL
Median Absolute Deviation Ratio
CAND-R2-MADR
forward 5-day realized volatility
vs. trailing-20d realized volatility
0.691 0.013 0.2100 FAIL
Fractal Hurst
CAND-R2-FH
forward 5-day realized volatility
vs. trailing-20d realized volatility
0.691 0.004 0.7200 FAIL
CAND-GAC: GAC FAILED the gate: incremental IC=+0.023 < 0.05 threshold. Although statistically significant (p=0.007), the effect size is too small — the metric is largely trailing-vol in disguise. raw_ic passes naively but the incremental gate exposes it.
CAND-H-KER: Failed on both criteria: p=0.304 >> 0.05 (not significant) and IC magnitude near zero. Incremental IC not computed because the raw IC was not significant. baseline_ic for return target not reported in sources.
CAND-OUV: Failed: p=0.231 >> 0.05. Incremental IC not computed. baseline_ic for return target not reported in sources.
CAND-EHA: Failed: p=0.710 >> 0.05, IC near zero. Directional metrics as a class did not predict returns in this test set.
CAND-R1-TCR: Failed both criteria: IC=0.019 < 0.05 and p=0.061 > 0.05. Appeared across multiple runs (Runs 1, 2, 3, 4) with consistent failure — a persistent near-miss.
CAND-R2-MADR: Failed on both criteria across multiple runs. Representative of the median/robust-vol family that consistently underperforms the gate.
CAND-R2-FH: Failed decisively: p=0.72, IC near zero. The Hurst/long-memory family does not add incremental signal over trailing vol in this dataset.
Trust Differentiator

What this means for you

Honest results

Failures included

Any vendor can cherry-pick winners. We publish every candidate we ran, including the ones that looked good on raw IC but failed the incremental gate.

Incremental gate

Not just "significant"

We control for the naive baseline. A metric that correlates with a simple moving average adds no value — it has to add incremental signal beyond what you already have.

Live scoreboard

API-backed

This page is rebuilt from the live validation API. When new candidates are tested or results are updated, the scoreboard reflects it automatically.

Next step

See the full methodology

Understand how we build, validate, and govern intelligence products — and how the validated metrics become agent-callable objects.