With trading in the world's main foreign exchange markets estimated by a recent Bank for International Settlements Triennial survey (December 2007), to be over US$3.2 trillion (c.€4.5trn) daily, equivalent to 30 times larger than the NYSE and NASDAQ combined, demand for historical pricing data would seem assured. This is especially true given that a tad over US$1trn is accounted by spot FX transactions alone - largely driven by big institutions but also fuelled by a rapidly growing online retail FX trading market. Developing a mechanical FX trading system requires back testing. And, backtesting for systematic and algorithmic traders requires plenty of high quality FX historical price data from which to build models and evaluate new trading strategies before they are unleashed into a live environment. As such, historical price data that includes tick data, raw and cleansed data, are therefore absolutely critical elements for many FX trading end users. Generally speaking, having local copies of data can lead to faster and improved back testing. Such data can be used to test MetaTrader 3.0 and MetaTrader 4.0 Expert Advisors, as well as other proprietary and bespoke platforms for mechanical FX trading systems.
Antoine Kohler, Managing Director, ICAP Information Services in London, commenting on the drivers fuelling increased demand for accessing historical FX pricing data, says: "Market participants are becoming increasingly sophisticated. There's an element of algo sophistication - in terms of low latency and accessing pools of liquidity as fast as possible. That all fuels the demand."
Whereas participation in the FX markets used to be limited to banks and other major institutions, the Internet has extended the range of traders right the way down to retail investors. Retail FX, or the so-called 'off-exchange market' segment, is now estimated to be 2% of the total FX market with daily trading volumes of US$60bn-US$80bn (€85bn-€112bn).
A number of FX historical data providers are available to the institutional and online FX retail trading communities. Thomson Reuters, ICAP and Bloomberg between them have a large share of the market: Reuters Trading for Foreign Exchange (RTFX), offers a single point of access to global FX liquidity and is available on Reuters 3000 Xtra and Reuters Dealing 3000. RTFX, which provides access to multiple market makers with a single login, is based on Reuters Electronic Trading technology, used by over 100 leading FX banks.
Following its acquisition by ICAP, the EBS platform was combined with its IDB broker parent's electronic broking business to create a single global multi-product platform. Thousands of users trade FX over Bloomberg FX <go>, which is supported by liquidity from 160 major and regional banks. FXall is also an important player in the FX data space and has more than 800 of the world's largest financial institutions trading billions on its platform and a network of 70 liquidity providers.
A number of providers offer historical FX CSV-formatted data including tick data, which in some cases can even be downloaded free of charge. ICAP, the world's premier inter-dealer broker, which lays claim to one of the "richest" historical databases due to the liquidity that EBS has managed to capture in the FX market over the years, is evolving its service in the historical data space. For example, back in November 2008 the firm announced that ICAP's historical tick data would be available for the first time through Thomson Reuters (with the latter reselling ICAP EBS historical data).
The expanded distribution agreement between the two firms was in response to market demand as algorithmic and program trading continued to grow. Thomson Reuters' customers have for the past year or so been able to access ICAP's historical tick data in packages that include FX, FX options, as well as fixed-income and interest rate derivatives. The agreement spanned ICAP's EBS electronic platform for spot FX. And, ICAP became the first broker to add its extensive tick data to Thomson Reuters Tick History (TRTH), a comprehensive, global historical tick database covering all data distributed over the Thomson Reuters real-time networks since 1996. TRTH is a key element within the Thomson Reuters Quant and Event-Driven Trading product suite, which offers comprehensive regulatory compliance content and backtesting investment tools.
Emmanuel Doe, Global Business Manager, Tick Solutions at Thomson Reuters, says: "There's a substantial amount of money to be made in the FX market and that is what is fundamentally driving the demand for these services at the end of the day."
"Increasingly market players are interested in FX since it is market place where there is uncaptured alpha, depending on the currency pair and what type of strategy traders are looking at. As such, it continues to present a lot of untapped opportunities," notes New York-based Doe.
From an historical data perspective, traders came to Thomson Reuters for data to initiate back testing, as the firm has had a fairly dominant position in the FX market. He adds: "Tick data is absolutely essential for back testing and going live [with an algorithmic strategy], simply because of the high frequency nature of FX trading."
Kohler says in relation to recent ICAP enhancements to the quality and breadth of data being offered: "We're undertaking this on an ongoing basis and continually investing in the area. Compared to the historical data that we were releasing just a few years ago, today it's much deeper and richer. This is specifically because we're able to store and tag [instruments identifiers] to a far more granular level."
ICAP also lay claim to offering probably the "most granular" real-time data output through their 'Live' feed. A year ago, 'Live' was providing four updates per second, but since before summer 2009, the frequency was increased to 10 updates per second. (The move reflects developments in the market data feeds for the EBS FX platform).
Historical tick data is used primarily for program and/or automated trading, portfolio management and valuations. It also plays a vital role in supporting the compliance, accounting and audit functions of financial institutions.
Philip Brittan, global business manager for FX at Bloomberg, confirms that the need for algorithmic FX trading strategies, FX Quant research, regulatory compliance/ trade validation requirements are all "valid reasons" driving more usage of historical FX data. "High-frequency algorithmic trading is the only driver that truly requires tick-by-tick data, so that undoubtedly accounts for the largest volume of FX data being consumed these days," he says.
Jim Foster, Global Head of Product Strategy, FXall says in addition to algo traders wanting historical tick data to build and back-test models, it can also be required for benchmarking purposes."Occasionally high-frequency algo traders want our historical market data to benchmark the system speed to monitor how fast we receive and distribute movements in the market in comparison to other data they are monitoring," he says.
Furthermore, institutional investors can need the ability to benchmark pricing for control and compliance purposes. Foster indicates that more and more clients are asking for "sophisticated reporting" to understand exactly where the market was at the time of execution to ensure they were benchmarked properly.
For institutional asset managers who have a need to analyse execution quality, FXall timestamps trades to the millisecond when they receive the requirement - at the time it is placed into the market - on each quote provided by the selected banks.
"On execution," he explains. "We allow a firm to analyse the execution rate versus both external benchmarks, and FXall prices. And, the availability of historical data includes similarly rigorous timestamps on each quote."
In terms of the issues that FX trading firms need to consider with regard to sourcing and storing of historical FX data, ICAP's Kohler notes that in terms of infrastructure, hardware and architecture, many participants that are handling significant volumes of such data are "quite savvy" in being able to manipulate databases. "You couldn't exactly give this to a total neophyte [a beginner] as they would not know what to do with it. Algo traders and those who have super sophisticated strategies typically know what to do with this type of data. And, these are the drivers for the offering."
Minor Huffman, CTO at FXall, explains that are a number of items that need to be considered in this regard, including: (1) Data quality, (2) Data coverage, (3) Analytical tools; and, (4) Hierarchical or other customised data models.
Data quality comes in to play in respect of the number of contributing banks or other rate sources, tradable versus indicative data, as well as error correction techniques used to cleanse erroneous or off-market data from the set. Data coverage revolves around currency pairs, spot prices and forward tenors, bid/offer rates versus mid rates, time intervals and historical coverage.
Huffman says with regard to analytical tools required for modelling will create specific requirements for data availability (e.g. operating system, data format). "Many clients will want access in order to verify and back test algorithms, which may drive the choice of technology used to maintain the data sets. Traditional relational database models don't handle time-series data efficiently," he adds.
In terms of hierarchical or other customised data models, the key issues are the tradability of the data and time stamps. "If a system has a high miss rate for trading, then the value of its data is reduced, both for back testing of models and measuring a system," notes Huffman.
And, in order for the data to be meaningful it has to represent tradable data. For example, is the data time stamped at receipt by the platform, in the matching engine, or at distribution? "Models will require improvements in data access speed and increases in data storage," he says.
Thomson Reuters' Doe says that in terms of storing or sourcing the data, often when firms store it themselves they can encounter issues around "gaps" in the data. And, the problem is not just isolated to a few firms. Largely it is due down to the firms' own collection mechanisms and where they are storing it, he explains. That is why firms come to a source to obtain this data (e.g. Thomson Reuters).
Bloomberg's Brittan assets that data cleanliness is the "most important factor" in this regard and is pertinent to all types of users. The questions here are:
Next is whether the data truly represents executable bids and offers available at the time.
"This is critical for algo modellers and for users looking to do trade validation," Brittan points out. "For users storing tick-by-tick data, managing the sheer volume of data can be a challenge and requires special consideration in itself. That said, the cost of mass storage has plummeted in recent years."
There are also many issues in terms of data rights, licences to actually store the data and use the data in an historical environment or even a real-time environment.
In terms of handling, managing and processing huge volumes of data, Doe points out: "Simple queries in that scenario can take an extremely long time as you are dealing with a certain amount of computing power. And, when trying to calculate terabytes of data, high performance engines are required to calculate this type of analysis if you are looking at a year of data, let alone a decade or more of data."
In this regard, Thomson Reuter's acquisition of Vhayu in August 2009, will help in fulfilling client needs. Vhayu provides a high-performance engine for clients to store, analyse and process this data, as well as undertake real-time analysis of Thomson Reuters' feed and send real-time messages to trade upon. Doe confirms that the firm is "investing heavily" in their historical Quant business and the FX space in particular. They view it as a growth area and also want to stay ahead of the curve.
On efforts that providers are making to streamline the process of accessing historical FX price data, ICAP's Kohler says: "As with all our offerings we're trying to make it as painless as possible for our customers to use our [FX] data as well as offering them the best possible service." The reduction in time slicing on the EBS platform to 10x/second is illustrative of the improvement in their service. He adds: "People know how to write to it. And, every single market participant who is present in the FX space is reading a market data element from ICAP EBS, whether real time or historical." Further enhancements - as yet undisclosed - are in the pipeline to improve the quality of data quality for ICAP EBS feeds as well as the real-time element, Kohler reveals.
FXall's Huffman says: "We make an effort to give our clients access to the data they are looking for. For example, our customer savings reports have market data built directly into it and included with the analysis, such as the rates at the time of trades."
At Bloomberg, Brittan explains that they are making it easier for users to find the vendor's data through functions like 'FXTF'. They also offer an ever-growing range of tools to undertake data analysis right within the terminal, including functions like 'VOLC', 'XCRV', 'XDSH', 'FXFM', and 'CIX,' as well as a whole suite of charting functions. "This all means that users don't have to deal with exporting, storing, and providing their own analysis tools, for a wide range of analytical applications," Brittan says.
Turning to the kinds of data distribution formats that are available and what factors may influence how firms choose to integrate data into their research and FX trading infrastructures, FXall's Foster says: "For live market data, FXall offers a proprietary format (Accelorate), using very compact messages to keep the transmission and processing time on the client side extremely low" For historical data, the firm offers a format, which is claimed "can easily be imported into client systems."
Bloomberg offers several enterprise-level data products, but also makes it easy for individual users carry out 'desktop analysis' via their Excel API.
ICAP's Kohler notes here: "I've found in this particular space, that even if it is extremely complicated to integrate, if you have the granularity of data, market participants will go miles to read your data. As such, we make every possible effort to facilitate the integration and reading/usage of our data - presenting it in a way that is at least user friendly and industry standard."
The firm has had examples where clients require a type of format that they would not accommodate in the normal course of business, and not in a format that is usually presented to people consuming data. Kohler says: "They've still been happy to take the data in any format that we could provide."
Thomson Reuters' Doe speaking on formats reveals that the firm does come across a lot of different infrastructures. "This accounts for why we provide it in pretty much a common CSV format, which is the standard format loaded into any database infrastructure," he says. "However, we do try to be as open as possible and always accommodate client needs."
In relation to FX data services and solutions available to help sort, clean and customise FX data, Bloomberg, ICAP (EBS) and Reuters Dealing are among the leaders in the expertise they can offer clients. With the FX market very well split, each of these players has their respective segments.
ICAP's solutions are usually pre-packaged depending on what sort of package the client orders, with different levels of service available to be purchased. Kohler adds: "If you buy real-time services from us, these are graded under three levels, while the historical data package is graded to five levels."
According to Brittan, one of the advantages that Bloomberg data has is that since the organisation spend a great deal of effort on quality controlling and cleaning the data, clients can avoid this task.
By contrast, Doe says that many of Thomson Reuters' clients - particularly FX proprietary trading clients - are often obsessive and "very particular" about how and what they want to clean in terms of data.
He adds: "If we clean the data for them it may be wiping out a potential source of alpha for them, as a result of the way in which we cleanse. And, many of our clients are keen to only clean it [data] themselves so that they can ensure what has been cleansed." For these players receiving raw, unadjusted or unsorted data is critical in not harming their ability to make serious money.
Bloomberg publishes many derivative time series, including realised volatility, skewness, and kurtosis [from the Greek meaning a measure of the 'peakedness' of the probability distribution of a real-valued random variable], GARCH volatility, fixings, analyst forecasts, currency strength indices, PPP parity levels, and total/carry return indices, alongside raw market data. Adds Brittan: "This provides our users a tremendous depth of available tools for their analysis. Bloomberg's CIX function is an additional very powerful tool for creating custom indices."
FXall's Foster says: "Internally we have our own proprietary algorithms for cleaning data in real time. We believe it's critical to have a robust methodology to determine unbiased prices that clients find reliable and use as many rate sources as possible, removing outliers to produce a consistently usable data set."
Guessing may be fatal in FX trading. Sophisticated traders and trading desks cannot disregard fundamental and technical data analysis to ultimately secure their investments and retain a stable level of profit. At the same time there is no sense trusting money to a trading system without total thorough testing of it with a wide and full range of historical data. This is why reliable and error-free historical data are essential for an adequate technical analysis and successfully verifying trading strategies. The data cannot and should not be full of spikes or gaps.