Mind the Gap: Should you buy historical data from a real time data vendor?
Ben Collins, Head of Sales (EMEA & APAC), BMLL
Ben Collins, Head of Sales (EMEA & APAC) at BMLL, has over 20 years' sales management experience working with major enterprises supporting their market data, analytics and execution needs. Prior to BMLL, he ran ACTIV Financial’s EMEA sales & business development activities, and was Global Sales & Marketing Director of Morningstar’s Real Time Data Business, as well as working at Tenfore Systems Limited, Thomson Financial, Bloomberg and Informa.
As trading firms become increasingly data driven in their strategies, the topic of high-quality historical market data to feed these strategies is often overlooked. But it is precisely this historical data that has become increasingly important as industry needs are changing: we are witnessing a growing demand for premium historical market data to build better analytics capabilities, understand and predict market behaviour and enable the incorporation of more advanced technologies into daily operations. Specifically, trading firms are looking at superior data sources that enable them to go beyond the top of book (Level 1 trading data) and assess the order book below the surface (Level 2 trading data).
For years, firms have struggled with having to use incomplete historical market data, whether through the difficulty of managing multiple data feed formats from different exchanges, a lack of data engineering resources, or simply as a result of incomplete or inconsistent data sets. To keep up with their AI, analytics or other data-driven ambitions, firms must now reassess the quality of information provided by their historical data providers. In the past, the main focus of my sales pitches was around the cost of compute and storage. The advent of global cloud providers, changing nature of talent hired by financial institutions, and development of data science analytics libraries, have changed all that. Now we are seeing data quality as the primary reason clients are picking up the phone.
A history of historical data
Looking for historical order-book-level data is challenging. Some exchanges don’t even possess the historical data from their own feeds! Until recently, real-time providers of exchange data would capture the raw feeds in order to support their real-time products and then throw that data away after a few days. This cast-off product was often the only way firms could secure any, albeit short-lived, historical data, unless they captured it themselves.
As financial institutions become much more data-driven, real-time vendors began to realise the value of historical data and started to capture and store these tick datasets. However, these providers are real-time data specialists and their historical data services are, by default, secondary products. So when it came to buying historical market data, the main factor for buyers was whether it was available at all, not whether the data was complete and well-documented.
Real-time data providers are, rightly, primarily concerned that data feeds are delivered in real-time, for real-time use cases. As I know from my previous experience working for real-time specialists, the key differentiators in real-time data delivery are latency, bandwidth and reliability. In order to achieve lower latency and bandwidth, real-time data providers do not process all the information they receive from an exchange - data fields like reference data, various timestamps or MMT flags are dropped. A real-time approach to data processing and engineering inevitably clashes with a historical data approach; it leads to an incomplete view and its value is ultimately lost.
“Respect the data!”
In today’s marketplace, those firms wanting to apply more value to historical datasets are faced with a dilemma. Real-time data vendors, typically the main providers of standard historical exchange data, often commercially bundle the historical data element into a contract for a real-time feed. The problem with this approach is that real-time contractual SLAs are managed quite differently from historical data SLAs - meaning providers can get away with not providing an exchange’s complete set of available historical data fields. Commercial bundling is one thing, but product bundling is quite another and is no longer fit for purpose.
And if firms are attempting to manage a historical database in-house, they soon realise that a non-trivial amount of data science resource is required to maintain ever-growing databases that need constant data-engineering upgrades, increased data storage and processing capacity. For the majority of financial institutions, outsourcing this function to a firm like BMLL should be an easy decision.
BMLL’s harmonised historical data
The advancement of cloud service technology, however, has meant that historical data can now be processed at scale and made available in flexible ways. BMLL, the data vendor specialising only in the realm of historical data, now provides a truly complete set of harmonised historical data on a T+1 service. Each set of data, from 75+ exchanges, is curated into a BMLL version with complete documentation, consistency of critical fields and transparency of that process.
BMLL does not have the constraints of real-time data providers. Each data set is based on the most granular Level 3 data available, which captures every single order and trade ID, allowing for the computation of metrics such as order fill probability, resting time and queue dynamics. BMLL has used this granularity to build unique products, metrics and analytics so that users can easily and reliably find and aggregate individual fields to better understand liquidity dynamics or market behaviour. This allows users to actually map their own trading data sets to the historical data, have the ability to allocate trades to particular classification types, and ultimately accelerate time to insights and decision-making.
Because BMLL leverages the efficient storage capacity of the cloud, it can store and access three different versions of the underlying data: the raw data direct from exchanges; a curated Level 3 version of the same data; and a harmonised version of data from a wide number of exchanges that allows cross-venue application. This achieves two things: forensic-level quality control by referring to the original raw data, as well as a normalised view to remove the burden of data engineering for BMLL’s clients.
No compromise; Level 3 engineering is the key to complete Level 1 and Level 2 historical data sets
In today’s data-driven world, the industry’s approach to historical data is changing. Firms increasingly realise that they no longer need to compromise on a lower-quality historical data when a high-quality alternative is available.
BMLL’s unrivalled focus on historical data ensures that accurate and timely information can be easily accessed, delivered and consumed. The integrity of the data is never lost, due to complete Level 3 engineering applied to Level 1 and Level 2 datasets.
With BMLL, the ordeal of working with dropped data is over. Firms are now able to discover the power of problem-free historical data - and many already do! This shift is not only a technological advancement; it's a leap towards a future where organisations realise the full potential of complete, ready to use historical datasets, giving them the confidence and speed to differentiate in a very competitive marketplace.
Throughout my years within the industry, I’ve seen many issues with data quality but now there is an alternative and market participants no longer need to accept second best. It seems simple when you say it out loud - if you want high quality historical data then go to a historical data specialist, not a real-time provider.