How the engine works

Understanding validation

After the optimizer finishes, the top finalists (the top-K setting) run a validation gauntlet designed to break them. Everything on this page is evidence you can inspect per strategy on the detail screen; the pass/fail summary is the verdict.

The three phases of the equity curve

Every validated strategy's equity curve is colored by phase:

Green — in-sample (IS). Data the optimizer saw while evolving. Impressive green means little: the strategy was selected for looking good here.
Magenta — out-of-sample (OOS). Walk-forward fold windows the optimizer never touched.
Yellow — incubation. The newest slice of the dataset (roughly 10%), held out entirely — "how would this have done if you'd built it back then and then traded it?"

Discount green; trust magenta and yellow. That one habit protects you from most overfitting.

Walk-forward analysis

The dataset is tiled into rolling folds: train on an in-sample window, test on the out-of-sample window that follows, roll forward, repeat. Per fold the report records in-sample and out-of-sample results and their ratio; the headline walk-forward efficiency is the mean OOS/IS ratio.

Rules of thumb:

WF efficiency	Reading
~1.0	Out-of-sample performs like in-sample — excellent
< 0.5	The edge halves on unseen data
Negative	Only worked in-sample — the classic overfit signature

Consistency — the fraction of folds profitable out-of-sample — matters as much as the mean. One monster fold can hide four losers.

Two modes (Quality tab): Rolling tiles many folds (rigorous default); Anchored makes one proportional IS → OOS → incubation split (easier to read on the chart, less rigorous).

Monte Carlo simulation

The gauntlet re-runs each finalist's trade sequence many times (iterations knob), permuting trade order and adding 1–2 ticks of random slippage per permutation. The result is a fan chart — percentile bands of equity paths — and a drawdown distribution: median, p95, p99, and CVaR₁₀ (the mean of the worst 10% of outcomes).

The point: your backtest is one path through history. If the p95 drawdown would break your account, the median curve is irrelevant. The Monte Carlo p95 drawdown is also what the max-drawdown gate checks when available — deliberately stricter than the single realized curve.

The MC robustness score (0–100) condenses this: ≥80 solid, ≥65 robust, ≥50 moderate, below that fragile.

Parameter stress

Every numeric parameter in the genome is jittered ±25% (the whole genome at once, repeated for the configured number of variations) and the strategy re-evaluated. The param-stress score (0–100) rewards a flat plateau — performance that survives perturbation — and punishes cliffs:

Score	Rating
≥ 85	Rock-Solid
60–84	Robust
40–59	Moderate
< 40	Fragile

High fitness + low stress score = an edge sitting on a knife's edge of parameter values. Expect it to break live, because live markets are a perturbation.

The per-gene breakdown (strategy detail → Param stress tab) jitters each parameter alone, sorted most-fragile-first, with a bar showing worst-case fitness retention: a long bar near 100% is a stable plateau; a short bar is the cliff the edge sits on.

The robustness gauge

The dial on the strategy detail is the mean of four 0–100 axes: gate pass ratio, capped walk-forward efficiency, Monte Carlo overall, and param stress. Bands: Robust ≥ 65, Caution 40–64, Fragile < 40. It's a summary, not a verdict — click the info icon for the formula with the live component values, and always read the parts.

One more cross-check: HBT

The leaderboard has an HBT div. column and the detail screen an HBT recon tab. This is an in-house reconciliation against a NinjaTrader-parity reference engine used during product QA; on standard installs it reads "not configured," and a blank column means "not checked" — not "failed."

Understanding validation

The three phases of the equity curve#

Walk-forward analysis#

Monte Carlo simulation#

Parameter stress#

The robustness gauge#

One more cross-check: HBT#