Coming back full circle: my unexpected path to re-joining a historical data provider
Gavin Carey, Head of Content, BMLL
Gavin Carey, Head of Content at BMLL, has over 25 years’ experience leading business development, content, technology, and product management teams in the real-time and historical market data space.
Data is everywhere … but not the right kind
The financial markets today present a fascinating paradox. We are awash in data - more venues, more quotes, more trades, more messages generated every nanosecond than ever before. Yet, for many of the increasing number of market participants, accessing truly high-quality, analysis-ready historical market data from which to draw insight, create signals, and make investment decisions, remains a significant challenge.
Throughout the evolution of our industry, one constant has been the search for better ways to harness the power of insights hidden within the data, the rise of Machine Learning and AI being the most high profile recent example. As quantitative strategies become ever more sophisticated, and regulatory demands intensify, the need for a reliable, granular, and trustworthy data foundation has never been more critical.
After decades observing and participating in this space, I recently joined BMLL as Head of Content. This wasn't a decision taken lightly, it was driven by recognising a unique approach that, I believe, represents the next logical step in addressing the core challenges that persist in the historical data space - challenges rooted in how data is sourced, processed, and ultimately, how it can be trusted.
My background in data
Early in my career, I was involved in the intricate work of designing and building the software that handled and processed real-time data feeds directly from exchanges, and delivered a consolidated stream of that data to some of the first internet-era electronic trading platforms. This gave me an appreciation for the importance of, and complexities involved in, capturing every message completely and accurately, as well as the sometimes very nuanced differences in meaning between seemingly similar data.
This learning stayed with me throughout my career, during which I have led global product lines, responsible for delivering real-time and historical data to the world's most demanding institutions, including product management of Thomson Reuters’ Elektron Real-Time product line and more recently several years leading and growing the Refinitiv Tick History business (effectively a recording of the Elektron Real-Time normalised output)
These experiences have left me with the certainty that the quality, accessibility, and usability of market data are fundamental determinants of success in the financial services industry, as well as an intimate understanding of the traditional approaches taken to creating historical data archives, and their inherent complexities, and set the stage for recognising the significance of a different path.
The demands placed on historical market data have shifted dramatically
There are three critical drivers of value necessary to have real trust in your data, and move with the speed necessary in a rapidly changing environment:
Quality: As quantitative strategies become ever more complex, and margins tighten, the criticality of having truly accurate data, of the highest quality, grows.
Completeness: This isn't just about having more data; it's about accessing the granular insights necessary to understand how markets truly function. Cutting edge signal seeking analysis, advanced Transaction Cost Analysis (TCA), compliance and market surveillance all increasingly rely on greater volumes of data, often including the full-depth and structure of the order book (Level 3 data).
- Consistency: The final critical factor when selecting an historical data set is normalisation. It must be thoughtful, informed, consistent, lossless, and well documented. Historical datasets that are created from a recording of a real time output can suffer from normalisation rules that have changed over time. Often the priority is to bring a new real-time venue to market, but the fields mapping rigor required to deliver a consistent historical product from that real time data are overlooked. Significant and costly effort is required to correct these incorrect or missing data fields.
Suboptimal historic market data is no longer an option
The sheer volume and complexity of historical market data exacerbates long-standing industry pain points, and there are some common signs of an organisation struggling with less than ideal datasets:
The "80% Problem": It's a widely recognised phenomenon that highly skilled, expensive quantitative analysts and data scientists often spend up to 80% of their time not on analysis or strategy development, but on the laborious tasks of cleaning, normalising, and preparing historical data for use. This represents a massive drain on resources and a significant bottleneck to innovation, and a direct hit to the bottom line.
Normalisation Nightmares: Creating a consistent view of data across dozens or hundreds of trading venues, each with its own unique message formats, conventions, and idiosyncrasies, is a monumental task. Many vendor normalisation schemes suffer from inconsistencies across regions or time, are poorly documented, or appear designed more for the convenience of internal engineering teams than the end-users. This forces quants to spend valuable time "re-normalising" vendor data or writing complex, venue-time-specific logic.
Data Quality Issues: Historical datasets, especially at the tick level, can be plagued by subtle but critical quality issues – gaps, spikes, packet losses, missing fields, timestamp inaccuracies. These issues undermine trust in the data, can lead to flawed analysis, failed strategies, lost alpha, and even regulatory fines. As reliance on granular data increases, the tolerance for such imperfections decreases dramatically.
Infrastructure Burden: Storing, managing, and processing the petabyte-scale datasets associated with global, full-depth historical data requires significant investment in infrastructure and specialised expertise, representing a substantial total cost of ownership (TCO).
Poor quality or inconsistent data doesn't just create friction, it fundamentally hinders the ability to perform accurate microstructure analysis, realistic backtesting, or meaningful TCA. This context underscores the critical need for solutions that address these issues at their root.
Here is why I think BMLL has the answer to these industry-wide problems
My decision to join BMLL in April 2025 was the result of a deliberate search for a company tackling the historical data challenges from first principles. I was looking for an organisation that wasn't just layering solutions onto existing paradigms but was fundamentally rethinking how granular historical data should be captured, curated, and delivered to meet the evolving needs of the market.
In BMLL, I recognised a distinct philosophy, encapsulated in its commitment to its mantra "Historical Data Done Properly". Its focus wasn't just on providing data, but on providing the highest quality harmonised Level 3, Level 2, and Level 1 data and analytics available. What truly set BMLL apart, in my view, was its foundational approach.
Central to this is BMLL's starting point: raw packet capture (PCAP) data, captured directly by the exchange or in colocation facilities. This is significant because PCAP represents the most complete, unadulterated record of exchange activity. It's a "lossless" format, meaning nothing is discarded or potentially misinterpreted through intermediate processing or normalisation steps before the historical record is created.
Crucially, BMLL further differentiates itself from many other providers with its commitment to retain this raw PCAP data indefinitely. This might seem like a subtle technical detail, but its implications are profound. It means that if an exchange changes a data specification, a subtle normalisation error is discovered years later, or if a superior processing methodology is developed, BMLL can go back to the original source material and re-process their entire historical dataset using the corrected logic. This ensures long-term consistency and adaptability, allowing the full depth of history to be corrected and improved over time.
This stands in contrast to traditional approaches where irreversible normalisation decisions made in the past might persist because the underlying raw data is no longer available in its original form.
This capability to maintain consistency across the entire historical record directly addresses the frustrations many users experience with evolving or inconsistent vendor normalisation schemas. It's this combination – starting with the purest source and preserving it indefinitely – that creates a powerful foundation for continuous quality improvement and historical integrity, representing a core differentiator in how BMLL approaches the data challenge.
BMLL delivers capacity for innovation to the capital markets
The potential that BMLL’s approach unlocks is truly transformative. When quantitative analysts, traders, and data scientists are freed from the shackles of data preparation and armed with reliable, granular insights and powerful analytical tools, their capacity for innovation increases exponentially. Better backtesting leads to more robust strategies. Deeper TCA leads to more efficient execution. Richer microstructure analysis leads to a more profound understanding of market dynamics.
The future of quantitative finance and data-driven trading relies upon trust in the data. I am delighted that BMLL has entrusted me with this critical feature of its unique product.
About the Author:
Gavin Carey joined BMLL as Head of Content in April 2025. He brings more than 25 years of experience leading business development, content, technology, and product management teams in the real-time and historical market data space. Prior to BMLL, he delivered a near doubling of Refinitiv’s Tick History revenues and oversaw the delivery of Thomson Reuters’ Elektron Real Time content into the public cloud.