Free during beta $19.99/mo at launch Get your free beta key →
Verdict Strategy BuilderNT8

How the engine works

Verdicts & robustness

Every validated strategy gets a verdict — the roll-up of the quality gates run against its validation evidence. It's the first column to read on the leaderboard.

The four states

Verdict Chip Meaning
Pass Green Every gate cleared — hard and soft.
Marginal Amber All hard gates cleared, but at least one soft check failed.
Kill Red At least one hard gate failed.
(blank / unvalidated) Ranked beyond the validated top-K, so no gauntlet was run. No verdict ≠ good.

Hard vs. soft gates

Hard gates — any failure is a Kill:

  • Trades — too few trades is noise, not an edge
  • Profit factor — gross profit ÷ gross loss floor
  • Net profit — must be meaningfully positive
  • Max drawdown — checked against the Monte Carlo p95 drawdown when available (stricter than the realized curve); a prop-firm trailing limit tightens it further
  • Overfit (OOS/IS retention) — the edge must survive unseen data
  • Worst quarter — the weakest sub-period must stay above its floor

Soft checks — failures make a Marginal: recovery factor, transaction-cost stress (profit factor must survive extra adverse ticks per trade), walk-forward consistency, walk-forward variance, out-of-sample trade count, recency (profit factor and net on the newest data), and the param-stress floor.

A selective pass route exists for infrequent-but-excellent traders: a candidate with a high profit factor on a modest sample and meaningful net profit can clear the trades/profit-factor/net trio as a unit. Every other gate still applies.

The per-gate detail — every gate's name, pass/fail, measured value and threshold — is on the strategy detail's Gates tab, so a verdict is always auditable.

How to act on each verdict

  • Pass — a candidate worth your attention, not a green light. Read the equity-curve phases, the Monte Carlo drawdowns, and the per-gene stress bars; then re-evaluate on fresh data and run it on a simulation account.
  • Marginal — read which soft check failed (Gates tab). A marginal recovery factor on a strategy you'd size small may be acceptable; marginal recency (fading on the newest data) usually isn't.
  • Kill — discard, but learn: a leaderboard of Kills with great in-sample fitness means the search is finding overfit — consider a less gameable fitness metric, wider data, or simpler blocks.
  • Blank — raise top-K and re-run if you want more candidates graded.

The robustness gauge

Next to the validation tabs, the Robustness dial (0–100) averages four axes — gate pass ratio, capped walk-forward efficiency, Monte Carlo overall, param stress — banded Robust ≥ 65, Caution 40–64, Fragile < 40. Verdict and gauge answer different questions: the verdict is "did it clear the floors?"; the gauge is "how much margin does it have?" A Pass at 66 and a Pass at 92 are different animals.