How the engine works

Verdicts & robustness

Every validated strategy gets a verdict — the roll-up of the quality gates run against its validation evidence. It's the first column to read on the leaderboard.

The four states

Verdict	Chip	Meaning
Pass	Green	Every gate cleared — hard and soft.
Marginal	Amber	All hard gates cleared, but at least one soft check failed.
Kill	Red	At least one hard gate failed.
(blank / unvalidated)	–	Ranked beyond the validated top-K, so no gauntlet was run. No verdict ≠ good.

Hard vs. soft gates

Hard gates — any failure is a Kill:

Trades — too few trades is noise, not an edge
Profit factor — gross profit ÷ gross loss floor
Net profit — must be meaningfully positive
Max drawdown — checked against the Monte Carlo p95 drawdown when available (stricter than the realized curve); a prop-firm trailing limit tightens it further
Overfit (OOS/IS retention) — the edge must survive unseen data
Worst quarter — the weakest sub-period must stay above its floor

Soft checks — failures make a Marginal: recovery factor, transaction-cost stress (profit factor must survive extra adverse ticks per trade), walk-forward consistency, walk-forward variance, out-of-sample trade count, recency (profit factor and net on the newest data), and the param-stress floor.

A selective pass route exists for infrequent-but-excellent traders: a candidate with a high profit factor on a modest sample and meaningful net profit can clear the trades/profit-factor/net trio as a unit. Every other gate still applies.

The per-gate detail — every gate's name, pass/fail, measured value and threshold — is on the strategy detail's Gates tab, so a verdict is always auditable.

How to act on each verdict

Pass — a candidate worth your attention, not a green light. Read the equity-curve phases, the Monte Carlo drawdowns, and the per-gene stress bars; then re-evaluate on fresh data and run it on a simulation account.
Marginal — read which soft check failed (Gates tab). A marginal recovery factor on a strategy you'd size small may be acceptable; marginal recency (fading on the newest data) usually isn't.
Kill — discard, but learn: a leaderboard of Kills with great in-sample fitness means the search is finding overfit — consider a less gameable fitness metric, wider data, or simpler blocks.
Blank — raise top-K and re-run if you want more candidates graded.

The robustness gauge

Next to the validation tabs, the Robustness dial (0–100) averages four axes — gate pass ratio, capped walk-forward efficiency, Monte Carlo overall, param stress — banded Robust ≥ 65, Caution 40–64, Fragile < 40. Verdict and gauge answer different questions: the verdict is "did it clear the floors?"; the gauge is "how much margin does it have?" A Pass at 66 and a Pass at 92 are different animals.

Verdicts & robustness

The four states#

Hard vs. soft gates#

How to act on each verdict#

The robustness gauge#

The four states

Hard vs. soft gates

How to act on each verdict

The robustness gauge