The forex markets are unique in at least two characteristics in respect of available market data. Firstly, equity and futures markets are fragmented but the cost and regulatory burden of launching new exchanges and dark pools means that the number of these venues is inherently more static than the OTC forex markets. In contrast, sources of liquidity in the forex markets span global and regional banks, ECN’s, trading technology platforms and buy-side firms. Sources of forex liquidity are more dynamic.
Secondly, forex quotes from a given bank differ between buy-side firms according to individual customer characteristics such as credit quality, assets, trading volume and trading style. This and the lack of an exchange model in the forex market, means that historical forex market data typically is limited to quotes and does not include actual trades, limiting the options for back-testing and simulation.
As such, the problem of procuring historical forex market data is a much more difficult proposition than for equities and futures where it is straightforward to purchase historical market data directly from exchanges or from several reputable suppliers of such data. Several forex ECN and technology platforms make historical market data available, but as noted above quotes from banks will not be those that would have been offered directly to a specific buy-side firm.
A common, cross asset problem, to using purchased historical market data (even when it can be procured) is that of timestamp synchronization. This is a specific case of what might be termed “infrastructure idiosyncrasies” in which market data originating from the same source will be recorded slightly differently by virtue of differences in the distance, connectivity and hardware and software recording devices. In the case of forex market data, the large number and variety of liquidity sources exacerbates this problem.
These characteristics of historical forex market data are less troublesome when just using daily forex rates but the problem becomes more important as the required granularity of market data increases (meaning more frequent quote data is required). In the case of back-testing intra-day trading strategies and algorithmic execution strategies, purchased forex data is of little value. Similarly, transaction cost analysis (TCA) becomes much more relevant when assessed within the context of what was available, quote-wise, and available across multiple liquidity providers for a specific buy-side firm.
There is little point in performing TCA on market data with quotes unattainable to the firm in question! Traditional, ex post, TCA is much more valuable when used in the context of pre-trade and real-time TCA. The ability to change dynamically algorithmic execution “in flight” is a powerful capability which by its very nature dictates that historical market data is in fact recorded data from the same feed as is used for live trading.
The screenshots below are taken from the TimeBase time-series database. They depict historical EUR/USD for the same time period (one hour on March 11, 2015) in a consolidated order book comprising several banks, technology platforms and ECNs (Figure 1) and individual order books for a constituent bank and technology platform (Figures 2 & 3). The red and green strips are different levels of bid and offer quotes: the length of the strip indicating the time duration the quote existed at that level. Even cursory visual inspection reveals significant differences that the intra-day and algorithmic trader would want to know.
Figure 1 – EUR/USD Aggregated Order Book for one hour
Figure 2 – EUR/USD Order Book for the same one hour from a global bank
Figure 3 – EUR/USD Order Book for the same one hour from a trading platform
The traditional manner of recording live data is to stream to data files for subsequent retrieval. This works but utilisation of historical data recorded in this way usually suffers from three deficiencies:
1) The strategy which needs to be back-tested on the historical data typically uses different technology than the same strategy running live.
2) As such, there is no seamless transition from historical to real-time data. Further, the very concept of comingling historical and real-time data is unfortunately anathema in many set-ups. This is unfortunate as the benefits of dynamic algorithmic execution require just such simultaneous access of both real-time and historical data.
3) There is a difference between back-testing and simulation. Simulation usually implies replaying recent (hours, days) market data through a trading strategy or algorithm. Different in scale, back-testing typically means observing (and subsequently altering) the operation of a strategy or algorithm over months of years of data.
The good news for forex trading participants on both the buy-side and sell-side is that, whilst there are no formal exchanges, three data centres in New York, London and Tokyo have become the de facto physical locations for accessing global forex markets. Major providers of liquidity and technology have made “points of presence” in these data centres meaning that typically a firm just needs to be present in one of them and at worst three. Also, there is now software commercially available whose sole mission is to connect to and make available multiple sources of liquidity comprised of the multiple types of venues. When coupled with built-for-purpose time-series data management software, forex trading participants can now readily deploy the most appropriate method of accessing historical market data: recording it!
By Stuart Farr, President, Deltix, Inc.