Terminology around Machine Learning (ML) and especially Artificial Intelligence (AI) is typically quite loose which leads to confusions. Therefore, it is important to provide some definitions for the purpose of this article to frame the AI/ML examples provided. As the name suggests AI is understood as human-like intelligence observed from machines. While the term was coined by John McCarthy, the organizer of the now famous Dartmouth Summer Research Project, it is Marvyn Minsky’s definition which is frequently used for AI — the science of making machines do things that would require intelligence if done by men.
Machine learning (ML) on the other hand is defined as a study of algorithms which learn from the data. Therefore, trained Machine learning models can be easily be utilised as AI. It is often implied that AI is more general than ML. An example of AI which is not ML is an expert system with hard coded human knowledge – the results look intelligent. Also by that definition every ML system which does something “which would require intelligence if done by men” is AI.
A more colloquial way to describe AI is a software which can learn and solve problems “by itself” and learns as we go along. Like on a trading floor, the trainee is becoming a trader after he/she “learns the environment” and then is expected to make decisions him/herself. By that definition linear regression is not an AI as it is not self-updating. So, when people say AI they normally picture something which does make some decisions, adapts to the changes in the environment and does not get confused by regime shifts. The pressure to apply AI business problems is largely due to competition to make use of increasing volumes of data. The challenge is to select the right AI, which can help to separate signal from noise by processing this ever increasing data flow.
Figure 1: Market Regimes Identify via Hidden Market Model
Direct application of AI and ML to finance/trading/asset management is complex as explained by de Prado (2018). Simply borrowing what works from say image recognition and applying it to financial time series will not work for a number of reasons. Low signal to noise ratio is one. A successful trader is correct (up or down on a trade) around 50 percent of the time – this percentage would be considered low for most other activities and would be considered random in statistics. Also there is a lot of overlap in financial observations. De Prado compares financial ML to standard ML where all observations have been mixed together in unknown proportions (blood tubes example on page 60 in de Prado, 2018).
Also, when it comes to finance the definition of “required intelligence if done by men” is different. A proportion of successful traders within any intelligent population is much smaller than the proportion of people who are able to recognize their friends and relatives (or tell cats and dogs apart) on a set of images. Kahneman (2013) provides an excellent review of how human “heuristics” (the same ones which allow us to recognize images successfully) fail to deliver successful results when it comes to more analytical decision making (where trading and financial management belongs to).
On the flip side a lot of concerns in Kahneman (2013) can be addressed by careful application of consistent Bayesian updating. Therefore, AI data processing and even humble statistics can be more successful than “human intelligence”. This has been known for long time in a systematic trading literature.
Despite all the challenges increased flow of data implies AI is as welcome in finance as in any other subject. Humans simply cannot cope with market speed and simple rule based models, while fast, tend to break down. Ability to process a lot of data and extract information from data as it would be done by an intelligent person but in a much more scalable way is extremely valuable in trading as it is in any other field. Therefore, the main criteria for our examples of AI below is their ability to replace the human work. We consider a perspective of trade execution which is a short term decision making.
Figure 2: Expected EURUSD Intraday dynamics in different market regimes
Creating trading context: Market Regimes
The first task of execution trader to identify a “market regime”. It is a mental model to map what is going on with the most likely price action today. A trader normally aggregates different volatility and risk aversion indicators. A judgement has therefore to be made on which ones are relevant.
Alternative to human decision making is to apply machine learning procedure to identify market regime. The procedure can cluster different return realizations over time intp homogenous groups called “regimes”. For example, a market regime can be risk-off with equity markets down, volatility up and USD up or down depending on the prevailing macro theme. Machine learning models such as Hidden Markov Model (HMM) can be applied to identify market regimes. HMM were introduced for speech recognition (an inherently AI problem) and date all the way back to 1980s (see Rabiner, 1989). Figure 1 follows the exposition in Jiltsov (2020a) and presents the evolution of different asset classes in different market regimes.
The regimes are identified by HMM estimation based on observable market variables such as equity return, volatility, credit spreads and liquidity indicators. HMM with fixed number of states maps naturally onto our human mental model. On the one hand it presents a small number of regimes so classification is easy (the human brain cannot manage large number of states, Miller law – see Miller 1956). At the same time the model is “fuzzy” in the sense it produces the probability estimate for each state rather than exact forecast so this confidence (probability) information can be utilized for eventual trading model.
This market regime classification can feed into trading desk decision making this an application of AI. For example Figure 2 presents expected intraday dynamics of EURUSD in different market regimes. This dynamics suggests different trading behaviour depending on what regime we are in, where in the day the execution is taking place, and how long do we expect the execution to take place. For example, a trader selling EURUSD in a green regime after 1pm may prefer to go slow as the expected trend is upward, while a trader selling EURUSD in a red regime should expect to do that faster. But even this can be an oversimplification as liquidity may be worse in a red regime (also part of the regime) so the market impact of going faster in a red regime may be prohibitively expensive. This the “last mile” decision making.
This “last mile” automation may be based on some AI as well or it can be a simple rule-based model. The second option might be less flexible in terms of modelling but will definitely be more explainable with the heavy lifting by machine learning identifying market dynamics. The first option is described in the next section.
PANEL A: Assumed Algo Performance (Invisible to AI decision maker)
PANEL B: Learning Process
PANEL C: Trading Selection Results. Average Reward Rate ($/M)
Reinforcement learning and algo selection
FX Algo selection is very similar to the classical multi-arm bandit problem which is a special case of reinforcement learning (a multi-arm bandit is reinforcement learning with no state). In fact the motivation of the multi-arm bandit was a gambler playing with a row of slot machines. We just return back to the basics: a trader playing with a set of algos.
A basic illustration of how reinforcement learning could be applied to algo selection is presented Jiltsov (2020b). The study calibrates typical volatility and performance of FX algo runs and runs a simulation study. The summary of the study and its conclusions are shown in Figure 3. The top panel shows the selection of algos, the middle shows Bayesian Bandits leaning process and the bottom panel show the cumulative reward accumulated by different bandit strategies. Unlike theoretical studies focusing on long run performance this study is focused on cumulative performance up to 100 algo runs assuming that it is the maximum amount of algo runs which can be done in a reasonably stationary environment (before algo characteristics change).
However the above study assumes that algo performance is the same in all market conditions. Ideally we would like to have a framework to learn about algo performance in a relevant trading context. Trading context could range from the basic variable such as time of the day, to market conditions and even regime variable identified in the previous section. This problem maps very well on the contextual bandit problem (see for example Lu, Pal, Pal, 2018 for theoretical results). In this framework it is assumed that before pulling an arm of a multi-arm bandit (selecting an algo in our case) the user gets to see some “context” (can be market colour in trading language).
The algorithm then builds a model which maps the distribution of algo expected return to observed context. Figure 4 presents a simple illustration for our 5 Algos considered above. On the x-axis we see the market variable realization – this is context. Assume for the sake of argument that this is a measure of market momentum. If the market momentum is slow (left hand side) we know (learnt over time) that Algo 5 has the best performance in those conditions. If the market momentum is strong, Algo 5 performance deteriorates while other Algos become more appealing. Note that uncertainty (band) around algo performance is inversely proportional to algo performance. The learning algorithm is selecting algos with higher expected reward more often thus learning proportionally more about their performance.
It should be noted that the mapping between observed context and the state is essentially an unknown function. This mapping can either be done by Bayesian regression as in the example in Figure 4 or can be done via a neural network in which case it is called Deep Contextual Bandit (see Collier and Urdiales, 2018).
Figure 4: Contextual Bandit Example – Context Mapping – Learning Algo Performance in Different Regimes
One thing that financial industry is naturally good at is collecting data. While 10 years ago the availability of big datasets of labelled photos was a challenge (thus together with computing power hindering the progress of deep learning for image recognition) financial data was in a better shape. So it is extremely attractive to just apply state of the art machine learning algorithms to financial problems. However, it is not always simple. De Prado (2018) provides a good example of how simple transfer-application of machine learning to finance fails.
However, some financial problems can be mapped very well on well-researched AI applications. This article argues for two such scenarios. The first is a so called market regime application. A large number of market variables can be mapped onto lower dimension (regime). Market behaviour within an identified regime is reasonably homogenous. The second is a reinforcement learning application for selecting the best FX algo to execute a given transaction. The two problems can be linked. Market regime identification can serve as a context for the algo selection problem transforming it into contextual bandits problem.
So far we have focused on an algo user perspective. An algo provider (electronic trader) has access to much richer datasets (order book events) and has more room for experimentation (high frequency order placement). However, the combination of techniques is very similar– Deep Neural Network to learn the q-state and reinforcement learning to formalize reward optimization for each action (see Bacoyannis et al, 2018 for detail).
Bilokon, P, M. Dixon, I Halperin, 2020, Machine Learning in Finance: From Theory to Practice, Wiley.
Jiltsov, A, 2020a, FX Markets, Market regimes: how to spot them and how to trade them
Jiltsov, A, 2020b, FX Algo News, Leveraging Machine Learning to determine FX Algo selection
De Prado, Marcos, 2018, Advances in Financial Machine Learning, Wiley
Minsky, editor. Semantic Information Processing. MIT Press, 1968
Miller, The Psychological Review, 1956, vol. 63, pp. 81-97, available here http://www.musanim.com/miller1956/
Kahneman, Daniel, 2013, Thinking, Fast and Slow, Farrar, Straus and Giroux
Richard S. Sutton and Andrew G. Barto, 2018, Reinforcement Learning: An Introduction, Second Edition
Davidson-Pilon, Cameron, “Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference “, 2016, Addison-Wesley Data & Analytics
Collier, Mark, Llorens Hector Urdiales, 2018, Deep Contextual Multi-armed Bandits, https://arxiv.org/abs/1807.09809
Lu, Tyler, David Pal, Martin Pal, 2018, Contextual Multi-Armed Bandits, Google Research
L.R. Rabiner (1989): “A tutorial on Hidden Markov Models and selected applications in speech recognition.” Proceedings of the IEEE 77: 257–286.
Bacoyannis, Vangelis, Vacslav Glukhov, Tom Jin, Jonathan Kochems, Doo Re Song, 2018 Idiosyncrasies and challenges of data driven learning in electronic trading, https://arxiv.org/abs/1811.09549