As first Published in Tabb Forum, 12 March 2020.

Most firms in financial markets are not getting full benefit from their predictive analytics as they are not using the most granular pricing data available from electronic trading venues.

To date, genuine barriers to adopting and using such data have existed, but this is no longer the case. Today, relatively easy solutions exist that will allow firms to extract maximum value from data and thus gain benefits for themselves, their clients and their shareholders. Benefits including improved understanding of the composition of limit order books (LOBs) and the addressable liquidity across multiple venues; as well as improved alpha generation and trading efficiency (eg broker and strategy selection). The transformation will provide competitive advantage through improved returns over those firms slower to exploit the opportunity, while improving risk management and regulatory compliance.

The data challenge today

There are four levels of public pricing data made available by electronic trading venues:

Trades data: aggregated executions

Level 1 (L1) data: the inside (best) price levels of the LOB, namely bid and ask

Level 2 (L2) data: the depth (up to 10 price level) of the LOB, with the size aggregated for each level

Level 3 (L3) data: all price levels of a limit order book (LOB) with individual orders visible

A further level of data (L4) can be generated by combining L3 data with private data.

L3 data are the most granular public data available from the market data feeds of electronic trading venues. A simple use of the data is to track the history of orders (including modifications) using the unique identifiers.

Finding alpha by analysing the full depth of the data

Far deeper insights – including identifying patterns and behaviours - come from applying machine learning algorithms to the data. These approaches can reveal the future intentions of traders, and the real addressable liquidity across venues. They can also predict behaviour at an individual order level including, for instance, order fill probability and order queue dynamics; as well as predicting periodicity of alpha during the day. Such information can be vital in optimising trading decisions. Meanwhile, using L4 data allows models to understand and predict the behaviour of individual participants (eg you or your clients). In the case of regulatory compliance, L3 and L4 data can expose prohibited trading practices to a degree that cannot be achieved with less granular data.

Data integration remains a challenge

Many firms have felt constrained from using L3 data by the challenge of integrating them within legacy systems. Because L3 content can vary between venues it needs to be harmonised (making it usable across venues and providers) before being used for modelling, prediction and order management purposes. Using L3 data also requires predictive models and algorithms to be rewritten or replaced to accommodate the increased granularity of information. On top of that, changes to legacy systems might be required to work with more granular data. Systems integration work can be non-trivial as many databases, and applications may be involved across multiple departments and teams. Such integrations may not make sense strategically in an organisation committed to rationalising / decommissioning legacy systems. Finally, the size of the data set increases by order of magnitude as you move to using the most granular data, the number of data points per annum coming from the e-mini S&P500 future is 112 million for trades data, but 24 billion at L3.

Cloud and Big Data Technologies are the future

One solution to these problems is to take pricing data from Data-as-a-Service (DaaS) providers who are using Cloud and Big Data technologies. Data that have been harmonised and aggregated from multiple venues by specialist market data and analytics firms. These firms are storing and managing all levels of public pricing data as well as doing the curating and harmonising. Typically, they also provide Big Data tools, open-source machine learning analytics and trading algorithms, and their own dashboards to unlock the predictive power of the data. They also offer a range of data access solutions and levels of integration from zero integration at one end of the spectrum (ie Cloud access to the platform for you to use the data and tools to do your own micro-structure research and build your own solutions); to full integration at the other end of the spectrum (eg integration with third-party dashboards and trading screens). Between those two extremes there will be options such as derived data being supplied via an FTP server and API; vendors own dashboards with ability to combine L3 data with your data to produce, visualise and model L4 data. Cloud services also give you subscription-based usage, the ability to scale on demand as standard. This is hugely transformative. Ultimately, the enhanced information and toolsets allows alphas to be generated, market behaviours to be predicted and orders to be scheduled (optimised for venue and timing). Improvements in returns for clients and shareholders will follow.

The race for data-driven-dominance is on

We are at the start of a scientific revolution in trading, asset management and risk management. The race for speed-based supremacy has all but been run, the race for data-driven-dominance will surely follow. Firms will increasingly need to extract the maximum amount of value from data to survive and thrive, so using anything less than maximum granularity of public pricing data available will not make sense.

At BMLL, we are delighted to be part of this evolving industry landscape.