Why Backtests Can Be Misleading
A backtest replays a strategy over historical prices and hands back a precise-looking report — profit, drawdown, win rate, all to two decimal places. That precision is deceptive: the simulation behind it makes simplifying assumptions about ticks, spreads, slippage and execution that almost always flatter the result. The sections below walk through the specific ways backtests mislead, with concrete numbers, and what out-of-sample and walk-forward testing actually fix.
Key takeaways
- An optimizer that tries thousands of parameter sets will always find one that looks great on the past — that alone says little about the future.
- MT4's every-tick mode interpolates ticks from one-minute bars, so intrabar fills can be invented; real-tick data is better but only as good as the broker's archive.
- Fixed-spread backtests skip the expensive moments: real spreads widen several times over around news releases and at rollover.
- Slippage, commission and swap routinely turn a small simulated edge into a flat or negative live result.
- Testing many symbols or ideas and keeping only the winners builds survivorship-style bias into the result before any report is read.
- Out-of-sample validation and walk-forward testing are the standard defenses against curve fitting — a backtest is evidence, not proof.
A simulation, not a recording
When the Strategy Tester runs an EA over five years of EUR/USD, it is not replaying the market you would have traded. It replays one broker’s stored price history, through a simplified execution model, with costs you configured yourself. The report at the end is exactly as honest as those three ingredients.
None of that makes backtesting useless — it is the right first filter for any rule-based idea, and how the MetaTrader tester works is covered separately. This guide is about the gap between the tester’s tidy world and a live account — a gap that is almost always in the strategy’s favor.
Curve fitting: optimizing into the noise
Historical prices contain some repeatable structure and a large amount of one-off noise. An optimizer cannot tell the difference. Give it four inputs with eight values each and it will run 4,096 separate backtests and hand you the best one. Some combination will look excellent on anydata — including random data — because picking the maximum of thousands of tries guarantees an impressive winner.
The symptom of curve fitting is a near-perfect historical equity curve that collapses when a parameter moves slightly or new data arrives. The more inputs a strategy has, and the harder they were tuned, the more of its backtest profit is noise that will not repeat.
| Signal | Overfit | Robust |
|---|---|---|
| Parameters | Many tuned inputs, each pushed to its historically best value | Few inputs; results barely change when each is nudged ±20% |
| Equity curve | Almost perfectly straight on the optimization sample | Realistic drawdowns, in-sample and out-of-sample |
| Sample | Short window, one market regime | Several years spanning trends, ranges and shocks |
| Trade count | Few trades, so a handful of lucky wins drive the stats | Hundreds of trades across sessions and conditions |
| Unseen data | Falls apart on dates the optimizer never touched | Performance degrades modestly but survives |
Tick data, interpolation and gaps
A tester needs every tick, but stored history rarely has them all. MT4’s “Every tick” mode interpolatesticks inside one-minute bars: the platform invents the path price took between a bar’s open, high, low and close. A strategy that lives intrabar — tight stops, scalping targets, trailing exits — can be filled on price paths that never existed. The familiar 90% “modelling quality” figure describes how ticks were generated, not how accurate they are.
The history itself can also be wrong: missing days quietly bridged, spikes from a bad feed, symbols whose digits or contract specs changed years ago. MT5’s “every tick based on real ticks” mode is materially better, but it is only as good as the tick archive the broker provides — which often thins out a few years back, exactly where long backtests need it.
Costs the tester gets wrong
Most default backtests assume a fixed spread, often set near the lowest value ever seen. Real spreads breathe: a EUR/USD spread that averages 1.4 pips might sit at 0.8 during the London session and jump past 6 pips for a few seconds around a news release or at the daily rollover. A fixed 1.0-pip assumption books the calm hours and skips the expensive moments. Before trusting any result, check its cost assumptions line by line:
- Spread— variable or recorded spread, not the lowest fixed value; compare against what your own account actually averages per symbol.
- Slippage— testers fill at the exact requested price by default; real fills drift, so an allowance per side (tenths of a pip on majors, more on exotics) belongs in the model.
- Requotes and rejections— invisible to the tester, real in fast markets, especially on instant-execution accounts.
- Commission— per-lot round-turn charges if the account type has them; on EUR/USD, $7 per standard lot is roughly 0.7 pips.
- Swap— overnight financing on anything held past rollover, including the tripled midweek charge.
How 0.9 pips of missing cost rewrites a result
- Backtest: 400 EUR/USD trades at 0.10 lots, average profit 1.5 pips per trade → +$600 (pip ≈ $1).
- Live spread averages 0.4 pips wider than the fixed test spread → −0.4 pips per trade.
- Slippage of ~0.15 pips per side on entry and exit → −0.3 pips per trade.
- Commission and occasional swap ≈ −0.2 pips per trade equivalent.
- Realistic edge: 1.5 − 0.9 = 0.6 pips per trade → +$240.
- 60% of the simulated profit was cost modelling, not strategy.
Conditions the tester cannot see
A historical price series carries no record of what executing in it felt like. Around a high-impact release the chart shows one long candle; it does not show the seconds when the spread was 8 pips, stop orders filled far through their levels, and new orders were requoted or rejected. The tester fills everything cleanly at chart prices with whatever spread you configured.
Broker differences compound this. Every broker’s feed has its own highs and lows — a stop that survives on one feed is hit on another — plus its own minimum stop distances, execution mode and server timezone, which shifts where daily candles open and close. The same EA over the same dates on two brokers’ data produces two different reports, and a live account adds a third.
Selection bias: keeping the survivors
A subtler distortion happens before any single report is read. Run one EA across 20 symbols and keep the three profitable ones, or sketch ten strategy ideas and develop the one with the best quick test — either way, the survivors were selected after the fact. With that many tries, a few standouts are expected from chance alone, so reporting only the winners makes luck look like edge. This is the retail cousin of survivorship bias, and no tester statistic flags it.
Out-of-sample and walk-forward testing
The standard defense is to deny the optimizer part of the data. Optimize on 2019–2023, then run the chosen settings once, unchanged, on 2024–2025. In-sample profit only proves the optimizer did its job; the out-of-sample result is the first real evidence about the strategy.
Walk-forward testingrepeats this on a rolling basis: optimize on a window, validate on the next slice, step forward, repeat. Settings that hold up across slices they were never tuned on are worth more than any single spotless run. Practical tester settings — real-tick data, variable spread, slippage allowances — are covered in the guide to improving MetaTrader backtests, and the free Monte Carlo Trading Simulator stresses a backtest’s trade sequence by reshuffling it thousands of times.
Evidence, not proof
A good backtest is necessary, not sufficient. It cheaply rules out broken ideas, maps which parameter regions matter, and sets rough expectations for drawdown and trade frequency. What it cannot do is certify the future — it can only describe one version of the past, usually the friendliest one.
Frequently asked
What should I check first when a backtest looks too good?
The usual suspects, in order: parameters curve-fitted to historical noise, costs the test ignored (variable spread, slippage, commission, swap), and tick data that doesn't match what your broker actually served. Each gap flatters the simulation; together they can erase a small edge entirely.
Does 90% modelling quality mean the backtest is 90% accurate?
No. In MT4 the modelling-quality figure describes how ticks were generated — mostly interpolated from one-minute bars — not how closely they match reality. A 90% test can still fill orders on price paths that never happened. Real-tick testing in MT5 is closer, but still depends on the broker's tick history.
How much spread and slippage should I assume in a backtest?
There is no universal number — measure your own account. Recent trade history shows the spreads and slippage you actually received per symbol and session. As a scenario, majors are often tested with a variable or average spread plus a few tenths of a pip of slippage per side, with wider assumptions around news.
What is walk-forward testing?
A rolling form of out-of-sample validation: optimize parameters on one window of history, run them unchanged on the next slice, then step both windows forward and repeat. A strategy that keeps performing on data its parameters never saw is far more credible than one perfect optimization run.
Related guides
How the MetaTrader Strategy Tester Works
What a backtest actually simulates — data sources, tick modelling modes, spread and fill assumptions, optimization.
How to Make MetaTrader Backtests More Realistic
Real ticks, realistic spread and costs, deliberate slippage and out-of-sample discipline — closing the gap between tested and live results.
Monte Carlo Analysis for Trading Systems
Re-running the same trades in random order to see the range of equity paths and drawdowns one set of statistics can produce.
Related free tools
Free, no login required.
Related NuvoraSync features
Sources & further reading
- MetaTrader 5 Help — Strategy Tester report — official documentation of the tester report and the statistics it shows.
- MQL5 Documentation — Statistics calculated in the tester — the statistic identifiers and definitions behind backtest report values.
Want to analyze your own MetaTrader account data automatically?
NuvoraSync is a read-only MetaTrader journal and analytics workspace. Connect MT4 or MT5 once and your trades, drawdown and performance update on their own — no manual entry, no signals, just your own data.
This article is for educational purposes only. It does not provide trading signals, investment advice, financial recommendations, broker recommendations or trade execution. Backtest results are historical simulations and do not predict future performance.