This discussion article is a continuation of — “Statistical Arbitrage with Pairs Trading and Backtesting”.
Let us recap what we have discussed:
a. Analyzed historical data of NSE stocks in the “Financial Services” sector to identify co-integrated pairs of stocks i.e., stocks that move together.
b. Based on statistical tests, identified two stocks — Bank of Baroda (BANKBARODA) and State Bank of India (SBIN) as the right candidates for the analysis.
c. Generated trading signals based on the mean-reversion principle of Pairs Trading and calculated profit and loss.
The focus of the earlier article was to explain the Pairs Trading Strategy and show its working. One of the important steps in the analysis, though not quite evident then, was to calculate the ratio of the number of bought/sold stocks of SBI and Bank of Baroda. In financial jargons, this ratio is called the hedge ratio. Calculation of hedge ratio is done using a simple proven statistical technique called — OLS linear regression analysis. The slope of the regression line is the hedge ratio, which was assumed to be constant over time. Thereby the ratio of the two stocks bought/ sold remains the same whenever a trading signal is generated. In other words, it is assumed that the strength of the correlation is a non-evolving function of time.
This article challenges this assumption and presents an improved version of the hedge ratio calculation. Looking at the price correlation chart below for the two stocks considered, it is quite evident that the degree of correlation is not the same over time. Hence, the constant hedge ratio assumption is not valid.
Logically, our fundamental objective should then be to calculate the revised hedge ratio whenever a trading signal is generated. This will provide us with the appropriate hedge ratio as a function of time.
Before we do this calculation, let us go back to the OLS linear regression model and try to understand where and how we can make this improvement. Linear regression model had two parameters — slope (β) and intercept (α) as defined below:
Y= β * X +α
Where — Y and X are daily price time series of SBI and BoB
In this method, slope and intercept are calculated using the frequentist approach and assumed to be constant over time. In the improved model, we want to factor in the changes in slope and intercept given observed values.
This concept is a central idea of Bayesian Linear Regression or Rolling regression. The aim of Bayesian Linear Regression is not to find the “single best” value of the model parameters, but rather to determine the distribution for the model parameters based on the observed values at any given point. These observed values are called priors and model parameter distribution is called — posterior distribution. The approach used in this article follows the rolling regression example by Stefan Jansen and Thomas Wiecki.
Let us assume slope and intercept follow a random walk through time i.e.,
Also, let us assume the standard deviation of the above distribution i.e., σα and σβ follow an exponential distribution.
With these specifications of the probabilistic model (distribution of α and β), we can define the regression line the way we did earlier. Post the definition, as per the Bayesian probabilistic model, we need to run a sampler that generates the posterior distribution of model parameters. In the current discussion, we have used the NUTS sampler (No U-Turn Sampler) which is based on the Markov Chain Monte Carlo (MCMC) sampling process.
Below schematic represents our model parameters and assumptions:
The below picture depicts how the intercept and slope coefficients have changed over the years, underlining the evolving correlations:
The following plot combines the prices series and the regression lines with hue indicating the timeline.
Details of the Python code and analysis process can be found at the GitHub link.
1. Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python, 2nd Edition by Stefan Jansen
2. Rolling Regression Example by Thomas Wiecki
3. Introduction to Bayesian Linear Regression link