How the engine works
Verdicts & robustness
Every validated strategy gets a verdict — the roll-up of the quality gates run against its validation evidence. It's the first column to read on the leaderboard.
The four states
| Verdict | Chip | Meaning |
|---|---|---|
| Pass | Green | Every gate cleared — hard and soft. |
| Marginal | Amber | All hard gates cleared, but at least one soft check failed. |
| Kill | Red | At least one hard gate failed. |
| (blank / unvalidated) | – | Ranked beyond the validated top-K, so no gauntlet was run. No verdict ≠ good. |
Hard vs. soft gates
Hard gates — any failure is a Kill:
- Trades — too few trades is noise, not an edge
- Profit factor — gross profit ÷ gross loss floor
- Net profit — must be meaningfully positive
- Max drawdown — checked against the Monte Carlo p95 drawdown when available (stricter than the realized curve); a prop-firm trailing limit tightens it further
- Overfit (OOS/IS retention) — the edge must survive unseen data
- Worst quarter — the weakest sub-period must stay above its floor
Soft checks — failures make a Marginal: recovery factor, transaction-cost stress (profit factor must survive extra adverse ticks per trade), walk-forward consistency, walk-forward variance, out-of-sample trade count, recency (profit factor and net on the newest data), and the param-stress floor.
A selective pass route exists for infrequent-but-excellent traders: a candidate with a high profit factor on a modest sample and meaningful net profit can clear the trades/profit-factor/net trio as a unit. Every other gate still applies.
The per-gate detail — every gate's name, pass/fail, measured value and threshold — is on the strategy detail's Gates tab, so a verdict is always auditable.
How to act on each verdict
- Pass — a candidate worth your attention, not a green light. Read the equity-curve phases, the Monte Carlo drawdowns, and the per-gene stress bars; then re-evaluate on fresh data and run it on a simulation account.
- Marginal — read which soft check failed (Gates tab). A marginal recovery factor on a strategy you'd size small may be acceptable; marginal recency (fading on the newest data) usually isn't.
- Kill — discard, but learn: a leaderboard of Kills with great in-sample fitness means the search is finding overfit — consider a less gameable fitness metric, wider data, or simpler blocks.
- Blank — raise top-K and re-run if you want more candidates graded.
The robustness gauge
Next to the validation tabs, the Robustness dial (0–100) averages four axes — gate pass ratio, capped walk-forward efficiency, Monte Carlo overall, param stress — banded Robust ≥ 65, Caution 40–64, Fragile < 40. Verdict and gauge answer different questions: the verdict is "did it clear the floors?"; the gauge is "how much margin does it have?" A Pass at 66 and a Pass at 92 are different animals.