How the engine works
Understanding validation
After the optimizer finishes, the top finalists (the top-K setting) run a validation gauntlet designed to break them. Everything on this page is evidence you can inspect per strategy on the detail screen; the pass/fail summary is the verdict.
The three phases of the equity curve
Every validated strategy's equity curve is colored by phase:
- Green — in-sample (IS). Data the optimizer saw while evolving. Impressive green means little: the strategy was selected for looking good here.
- Magenta — out-of-sample (OOS). Walk-forward fold windows the optimizer never touched.
- Yellow — incubation. The newest slice of the dataset (roughly 10%), held out entirely — "how would this have done if you'd built it back then and then traded it?"
Discount green; trust magenta and yellow. That one habit protects you from most overfitting.
Walk-forward analysis
The dataset is tiled into rolling folds: train on an in-sample window, test on the out-of-sample window that follows, roll forward, repeat. Per fold the report records in-sample and out-of-sample results and their ratio; the headline walk-forward efficiency is the mean OOS/IS ratio.
Rules of thumb:
| WF efficiency | Reading |
|---|---|
| ~1.0 | Out-of-sample performs like in-sample — excellent |
| < 0.5 | The edge halves on unseen data |
| Negative | Only worked in-sample — the classic overfit signature |
Consistency — the fraction of folds profitable out-of-sample — matters as much as the mean. One monster fold can hide four losers.
Two modes (Quality tab): Rolling tiles many folds (rigorous default); Anchored makes one proportional IS → OOS → incubation split (easier to read on the chart, less rigorous).
Monte Carlo simulation
The gauntlet re-runs each finalist's trade sequence many times (iterations knob), permuting trade order and adding 1–2 ticks of random slippage per permutation. The result is a fan chart — percentile bands of equity paths — and a drawdown distribution: median, p95, p99, and CVaR₁₀ (the mean of the worst 10% of outcomes).
The point: your backtest is one path through history. If the p95 drawdown would break your account, the median curve is irrelevant. The Monte Carlo p95 drawdown is also what the max-drawdown gate checks when available — deliberately stricter than the single realized curve.
The MC robustness score (0–100) condenses this: ≥80 solid, ≥65 robust, ≥50 moderate, below that fragile.
Parameter stress
Every numeric parameter in the genome is jittered ±25% (the whole genome at once, repeated for the configured number of variations) and the strategy re-evaluated. The param-stress score (0–100) rewards a flat plateau — performance that survives perturbation — and punishes cliffs:
| Score | Rating |
|---|---|
| ≥ 85 | Rock-Solid |
| 60–84 | Robust |
| 40–59 | Moderate |
| < 40 | Fragile |
High fitness + low stress score = an edge sitting on a knife's edge of parameter values. Expect it to break live, because live markets are a perturbation.
The per-gene breakdown (strategy detail → Param stress tab) jitters each parameter alone, sorted most-fragile-first, with a bar showing worst-case fitness retention: a long bar near 100% is a stable plateau; a short bar is the cliff the edge sits on.
The robustness gauge
The dial on the strategy detail is the mean of four 0–100 axes: gate pass ratio, capped walk-forward efficiency, Monte Carlo overall, and param stress. Bands: Robust ≥ 65, Caution 40–64, Fragile < 40. It's a summary, not a verdict — click the info icon for the formula with the live component values, and always read the parts.
One more cross-check: HBT
The leaderboard has an HBT div. column and the detail screen an HBT recon tab. This is an in-house reconciliation against a NinjaTrader-parity reference engine used during product QA; on standard installs it reads "not configured," and a blank column means "not checked" — not "failed."