Machine learning applications for financial markets

What is machine learning?

The Data Deluge
© luckey_sun

In the field of Artificial Intelligence (AI), which aims at using computers to carry out tasks that normally require human intelligence, machine learning is currently the dominant trend.

In machine learning the computer carries out tasks for which it has not been specifically programmed and builds its own models and sometimes even develops them further, based on new data.

Machine learning algorithms use large quantities of data and are similar in this sense to data mining or business intelligence. However, data mining is limited to data insight through analysis and synthesis.

Machine learning goes further in that it can produce rules and models capable of explaining the data, potentially predict new data (predictive analytics) and perhaps even make data-driven decisions based on the new data and the established model.

In general, machine learning can be divided into supervised learning and unsupervised learning.

In supervised learning the algorithm receives “labeled” data that it can learn from. For example, in anti-spam filters, the user designates the undesirable emails. The algorithm progressively builds on its model by generalizing the situations observed in the data.

The variable to predict can be discrete (e.g.: spam/not spam), the aim being to establish a classification model of the input data. Or, it can be continuous and take on any value (e.g. the price of a bond) and in this case we are looking at a regression model.  It's obviously a lot more complex than just a simple linear regression as multivariate analysis is required.

Nuage de points illustrant l'apprentissage supervisé (classification)
Supervised learning (classification)
The model tries to predict the value of a discrete variable (either blue, red or yellow)

In unsupervised learning, the algorithm receives raw data with no external aid and no instructions concerning the desired result. It will try to group the data (clustering) to detect patterns, correlations and inferences.

Nuage de points illustrant l'apprentissage non supervisé
Unsupervised learning
The model builds clusters of similar observations (according to a user-defined distance criterion).

Deep learning tackles particularly vast amounts of mostly unstructured datasets (big data) that cannot be modeled with conventional database systems. It is particularly well adapted to image identification and classification.

Machine learning also aims at producing systems that are capable of evolving and adapting to new situations.  For example evolutionary computation uses iterative progress to select increasingly efficient predictors by recombining them and keeping the ones that produce the best results, at each iteration.

Is this new?

Most of the techniques used in machine learning are based on mathematical theories (advanced statistics, decision trees, Bayesian networks, neural networks etc.) that have been around for 50 years if not more. They are being used more and more frequently today due to the combination of a certain number of factors:

  • A steady decrease in data storage costs
  • A steady increase in computing power
  • An explosion of the quantity of information available in digital form
  • The fact that most of this information consists of largely unstructured data and requires processing techniques other than conventional methods based on databases and sequential programming

Machine learning is present almost everywhere today, e.g. banks can use it to assess the creditworthiness of a borrower and search engines use it to produce results and ads adapted to the needs of the internet user. The most impressive applications concern text and image recognition, strategy games like chess or go, and robotics as seen in Google Car.

What are the applications of machine learning to market finance?

Trading and investment

In the trading business, the use of techniques, no matter how sophisticated they may be in predicting short-term trends have to rapidly contend with the principle of market efficiency or efficient market hypothesis. A decision based on this prediction affects the markets and the prediction is therefore immediately incorporated in the price.  As a result it is very difficult to generate consistent short-term profits without risk over time.

Machine learning shows the most promise in the field of portfolio management, which represents longer-term investments. In order to identify the best investment opportunities, portfolio managers use machine learning to analyze all available data (not only financial reports but also press releases, news information and even sound or video recordings) concerning businesses. The idea is to highlight any relevant relationship between the operating and financial history of a company and its performance on the stock market

In portfolio management, as is the case with all knowledge-intensive fields, the expertise acquired through experience relies mainly on the immediate recognition of patterns. Thanks to machine learning we are starting to see true expert systems.

An increasing number of asset managers are using machine learning to either make investment decisions or at least support their decisions with the hope that the algorithms produced will be capable to adapt faster to a changing environment than the traditional solutions designed by quants.

A Hedge Fund using only investment decisions based on a trading system relying on artificial intelligence has even been launched recently. However, performance results of the fund have not yet been disclosed.

Risk management

Machine learning solutions that take into account the history of trade activities and also instant messaging content can be used to analyze trading behavior on an ongoing basis, These solutions are a lot more efficient than the traditional methods of post checks as they can identify fraudulent behavior or non authorized risk taking rapidly.

What are the risks?

As far as the algorithms themselves are concerned, the most common pitfall is "overfitting", meaning the model becomes overly complicated and ends up not being able to distinguish the difference between true correlations and noise from the data.

However, it is mainly compliance managers and regulators who could most likely find fault with these emerging trends. The fact is that a “traditional” program, albeit complex, can be described in a comprehensible manner and its behavior remains predictable whereas this is not necessarily the case for the systems based on machine learning as they can behave like black boxes.  Ironically that is when human intervention will prove to be essential in order to explain and control machine behavior.