Garbage In, Losses Out

What defines high-quality historical market data, and why it matters

First published by TabbFORUM, 17 September, 2025

By Dr Elliot Banks, Chief Product Officer, BMLL.

The Capital Markets run on information. From ticker tape to microwave networks, the faster and cleaner the data, the sharper the edge. Yet despite the exponential growth of numbers coursing through trading systems, the scarcity is not of data, but of high-quality data - usable, reliable and easily accessible.

For firms across the trading ecosystem, this distinction is not trivial. Substandard data imposes real costs. Quants and researchers spend hours, sometimes days, scrubbing datasets rather than searching for alpha. Execution desks misinterpret market signals. Inefficiencies multiply, trading decisions suffer, and profitability erodes. In an age drowning in numbers, quality remains the rarest commodity.

In an increasingly sophisticated trading environment - high quality historical market data is the foundation upon which sound strategies are built. Whether you’re backtesting alpha strategies, optimising execution algorithms, or performing Transaction Cost Analysis (TCA), the pursuit of high-quality market data is critical for success. Investing in superior data is an investment in operational efficiency, analytical accuracy, and ultimately optimal trading performance.

In this article, we explore what makes good quality, usable historical market data, and the key questions to ask when sourcing it.

The hallmarks of quality

So what, precisely, makes market data “good”?

At a high level, it is easy to define some key measures of data quality:

  • Accuracy: Data must reflect reality without errors, omissions, or distortions, as inaccuracies lead to flawed analyses, poor trading decisions and sometimes regulatory non-compliance.
  • Timeliness: Data must be available when needed, with sufficient time for analysis.
  • Completeness: Data sets should contain all necessary information, as gaps hinder comprehensive analysis.
  • Consistency: Data should be uniform across different sources, formats and timeframes to avoid conflicting interpretations and operational inefficiencies (if data cannot be reconciled).
  • Auditability: Data lineage and transformations should be traceable to ensure transparency and verify data integrity for compliance.

These principles sound simple, but the scale of modern historical datasets makes them anything but. A single stock or future can contain millions of updates on a single day. Testing strategies, optimising execution or running transaction cost analysis all demand confidence that these mountains of data are both correct and complete

Putting it into practice

For buyers of market data, understanding and evaluating data quality can be challenging. How do you define "accurate", "complete" or “consistent”, and what do these definitions mean in practice for historical market data? And how do you evaluate vendors, many of whom will claim to have “high quality” data?

A simple, but important question to ask yourself when evaluating data quality is how easy is this dataset to use? For example, how quickly can you start testing and researching a new project that relies on this data; how easy it is to extend your analysis to include more historical market data; and how quickly can your strategy be replicated across other markets? How many edge cases, changes and data issues will you have to work around for every single venue you onboard?

Firms sourcing historical data would be wise to interrogate their vendors. Can the supplier explain every gap or packet loss in their records? Do they know precisely where and how the data was captured? What quality checks are performed, and how quickly are problems resolved? Do they preserve raw data, or merely repackage normalised feeds? And what is their model for normalisation - can it be explained without recourse to black-box mystique?

All too often, the answers are evasive. Many vendors cannot provide clear explanations, because they themselves are uncertain of what exactly they have captured. That, in itself, is revealing: if a supplier cannot account for its own data, it is unlikely to be dependable.

Sourcing historical market data: ‘D.I.Y’. vs buy

When sourcing historical market data, until recently, there have been two options:

  1. You can do it yourself - capture, store and manage a real time source of market data, either from the exchange or from a consolidated feed. Capturing, storing and managing a real time source of market data has a significant engineering cost and can be a major time-drain. This is especially true as market data volumes continue to grow, doubling in size every few years.
  2. You can take a historical data product from a vendor that captures its own real time feed. The historical data is often an “exhaust” product of the real time data product - it is a recording of the real time data, often stored without looking at the data captured. This leads to data of unknown quality where vendors have limited knowledge of the content that has been captured.

Different users, teams or trading strategies within a single organisation may have different views of the granularity, frequency or definition of quality for market data. For example, engineering teams may favour data products that look like the real-time output, whereas researchers want a dataset that’s easy to work with in order to find alpha quickly. Understanding which choice to make, and assessing the quality of data being supplied, is therefore critical.

A third way - buying to build

Now there is a third option. By taking data from best in breed, historically focused vendors, you can leverage high quality normalised data, that preserves raw exchange fields without compromising on quality.

At BMLL, we build all of our datasets directly from raw data, handling all the engineering effort to process, manage and curate the data into a usable dataset. We undertake the tasks of understanding and handling data normalisation, and provide all the quality information necessary to completely understand every single day of data in our system. BMLL also adopts the practice of “eating our own dog food”, as we use our own market data, both to create analytics, and also in market structure research and analysis.

This is why so many of the largest and most sophisticated quantitative trading firms trust the BMLL normalisation process - we do the type of engineering that they would do themselves, saving the time and effort of actually doing it. This means firms can run global historical TCA in minutes, get started on analysing new markets in hours, or generate alpha signals in days, saving them considerable time, effort and resources.

Why it matters

When it comes to historical market data, quality is everything. As markets grow faster and more complex, the demand for trustworthy historical data will only intensify. Strategies depend on it, regulators insist on it, and firms compete on the edge it provides.

For anyone sourcing historical market data, you need to ask yourself and your vendors the right questions: Is this data going to be easy to use (both for engineers and analysts)? Can you quickly use the data across other markets and regions? And does your vendor really understand their historical data? Get the right answer to these questions and you will be investing in genuinely good data, and on your way to a trading advantage.