Introduction to Time Series Analysis and key concepts

Stationarity, Random walk, White noise, TS models and evaluation of models

In this article, we am going to examine what is time series analysis, its scope in the future and key concepts of time series analysis.

Photo by Panwar Abhash Anil

Introduction: What is time series analysis and its importance?
What is stationary in time series and its types?
White Noise & Random Walk
ACF & PACF
Models: AR, MA, ARMA , ARMAX and ARIMA
Identifying the Order of models
Model diagnostics
Box-Jenkins method

Time series analysis is a statistical technique that deals with time series data, or trend analysis. Time series data means that data is in a series of particular time periods or intervals.

A time series consists of the following components:

Image showing Trend, Seasonality and Cyclicality (Photo by Panwar Abhash Anil )

Trend: The trend shows the general tendency of the data to increase or decrease during a long period of time. A trend is a smooth, general, long-term, average tendency. It is not always necessary that the increase or decrease is in the same direction throughout the given period of time.
Seasonality: Patterns that frequently repeat at regular intervals. For example: high sales every weekend.
Cyclicality: Cyclicality is where there is a repeating pattern but no fixed period.

Scope of the Time Series Analysis:

Stock Market Analysis
Economic Forecasting
Inventory studies
Demand Forecasting
Sales Forecasting and more

Stationary means that the distribution of the data does not change with time.

Stationary v/s Non-Stationary (Photo by Panwar Abhash Anil)

No Trend: It is not growing or shrinking.
Mean & Variance constant: The average distance of the data points from the zero line is not changing.
AutoCorrelation Constant: How each value in time series is related to its neighbors stays the same.

◦ Types of stationary:

Strong stationary: Entire distribution of data is time-invariant.
Weak stationary: mean, variance and autocorrelation are time-invariant (i.e., for autocorrelation, corr[(X(t) , X(t−τ )] is only a function of τ )

◦ Making time series stationary:

Stationarity Through Differencing time series
Taking log of time series
Taking Square root of time series
Taking the proportional change (df.shift(1)/df)

◦ Test for stationarity: Augmented Dicky Fuller test

Null hypothesis is that the time series is non-stationary.
Dicky-Fuller only tests for trend.

Identifying whether a time series is stationary or not is very important. If it is stationary then we can use models that take assumptions that time series need to be stationary to predict the next values of the time series using historical data. If it is non-stationary then make it stationary on applied transformations or use a model.

White Noise: A time series is white noise when sequence of uncorrelated random variables that are identically distributed. Stock returns are often modeled as white noise. Unfortunately, for white noise, we cannot forecast future observations based on the past — autocorrelations at all lags are zero.

White Noise is a series with:

Constant mean
Constant variance
Zero autocorrelations at all lags

Special Case: if data has normal distribution, then Gaussian White Noise.

Random Walk: A random walk is another time series model where the current observation is equal to the previous observation with a noise.

In a random walk, today’s price is equal to yesterday’s price plus some noise.
Can’t forecast a random walk
Incidentally, if prices are in logs, the difference in log price is one way to measure return.
To test whether the time series is random walk, you can regress current values on lagged values. If the slope coefficient (beta) is not significantly different from one then we cannot reject the null hypothesis that the series is a random walk. However if the slope is less than one we can reject the null hypothesis.

Autocorrelation: Autocorrelation is the correlation of a single time series with a lagged copy of itself. It is also called single correlation

ACF is a complete auto-correlation function which gives us values of auto-correlation of any series with its lagged values. We plot these values along with the confidence interval. In simple terms, it describes how well the present value of the series is related with its past values. A time series can have components like trend, seasonality, cyclic and residual. ACF considers all these components while finding correlations hence it’s a ‘complete auto-correlation plot’.

ACF shows not only at one lag autocorrelation, but the entire autocorrelation function for different lags.

PACF is a partial auto-correlation function. PACF is a conditional correlation which gives the partial correlation of a stationary time series with its own lagged values, regressing the values of the time series at all shorter lags. It contrasts with the autocorrelation function, which does not control for other lags.

5.1) Autoregressive model:

In an autoregressive model, we regress the values of the time series against previous values of the same time series.

An autoregressive (AR) model predicts future behavior based on past behavior. It’s used for forecasting when there is some correlation between values in a time series and the values that precede and succeed them.

Image showing AR models (Photo by Panwar Abhash Anil)

The order of the model is the number of times lags (p) used. and for stationary, -1<a1,a2,..,ap<1. If AR parameter (p) is 0, then the process is white noise.

5.2] Moving Average Model (MA model):

In the MA model, we regress the values of the time series against the previous shocks/residual values of time series.

Image showing MA models (Photo by Panwar Abhash Anil)

The order of the model is the number of times lags (q) used. and for stationary, -1<m1,m2,..,mq<1. If MA parameter (p) is 0, then the process is white noise.

5.3] Autoregressive Moving Average (ARMA) Model:

An ARMA model is a combination of AR and MA models. The time series is regressed on the previous values and the previous shock term.

Image showing ARMA models (Photo by Panwar Abhash Anil)

5.4) Autoregressive–moving-average model with exogenous inputs model Model (ARMAX):

An ARMAX is a model of lagged dependent variable and lagged independent variable(s). One possible extension to the ARMA model is to use exogenous. This means that we model the time series using other independent variables as well as the time series itself.

This is like a combination between an ARMA model and a normal linear regression model.

Exogenous ARMA
Use external variables as well as time series
ARMAX =ARMA + linear regression

In principle, an ARMAX model is a linear regression model that uses an ARMA-type process [i.e. w(t)] to model residuals:

Equation of ARMAX (Photo by Panwar Abhash Anil)

5.5) ARIMA model:

ARIMA, short for ‘Auto Regressive Integrated Moving Average’ is actually a class of models that ‘explains’ a given time series based on its own past values, that is, its own lags and the lagged forecast errors, so that equation can be used to forecast future values.

We cannot apply the ARMA model to non-stationary times series. We need to take the difference of the time series to make it stationary. Only then can we model it.

However, when we do this, we have a modal which is trained to predict the value of the difference of the time series. What we really want to predict is not the difference, but the actual value of the time series.

An ARIMA model is characterized by 3 terms: p, d, q

where,

p is the order of the AR term
q is the order of the MA term
d is the number of differencing required to make the time series stationary

6.1) Using ACF and PACF to choose model order: By looking at the autocorrelation function (ACF) and partial autocorrelation (PACF) plots of the differenced series, you can tentatively identify the numbers of AR and/or MA terms i.e p,q that are needed.

ACF is used to identify order of MA term, and PACF for AR. There is a thumb rule that for MA, the lag where ACF shuts off suddenly is the order of MA and similarly for PACF and AR.

AR conditions (Photo by Panwar Abhash Anil)

If the amplitude of the ACF tails off with increasing lag and PACF cuts off after some lag p, then we have an AR(p) model.

MA conditions (Photo by Panwar Abhash Anil)

If the amplitude of the ACF cuts off after some lag q and the
amplitude of the PACF tails off then we have a MA(q)
model.

Example of AR & MA (Photo by Panwar Abhash Anil)

6.2) Information cretiaria: Two popular adjusted goodness-of-fit measures

6.2.1) AIC (Akaike Information Criterion) :

The AIC is a matrix which tells us how good a model is. A model which makes better predictions is given a lower AIC score.
AIC also penalizes models which have lots of parameters. This means if we set the order too high compared to the data, we will get a high IC value. This stops us overfitting to the training data.

6.2.2) BIC (Bayesian Information Criterion) :

BIC is similar to the AIC, models which fit the data better have lower BICs and the BIC penalizes overly complex models.
The BIC penalizes additional model orders more than AIC and so the BIC will sometimes suggest a simpler model. (BIC favors simpler models)

Model diagnostics to confirm our model is behaving well. To diagnose our model we focus on the residuals to the training data.

The residuals are the difference between our model’s one-step-ahead predictions and the real values of the time series.

7.1) Mean Absolute Error:
How large the residuals are and so how far our predictions are from the true values. Then calculate mae of the residuals
If the model fits well the residuals will be white gaussian centered on zero.

7.2) Plot Diagnostics:

Different plots (Photo by Panwar Abhash Anil)

7.2.1) Residuals plot: One of the four plots shows the one-step-ahead standardized residuals. If our model is working correctly, there should be no obvious structure in the residuals.

Standardized Plot (Photo by Panwar Abhash Anil)

7.2.2) Histogram plus estimated density: It shows the distribution of the residuals. Histogram shows us the measured distribution. Green lines show smoothed versions and orange lines show normal distribution.

Histogram of residuals (Photo by Panwar Abhash Anil)

7.2.3) Normal Q-Q: The normal Q-Q plot is another way to show how the distribution of the model residuals compares to the normal distribution.

Normal Q-Q plot (Photo by Panwar Abhash Anil)

If our residuals are normally distributed then all the points should lie along the red line, except perhaps some values at either end.

7.2.4) Correlogram: The last plot is the correlogram, which is just an ACF plot of the residuals rather than the data. 95% of the correlations for lag greater than zero should not be significant. If there is significant correlation in the residuals, it means that there is information in the data that our model hasn’t captured. The residuals are correlated. You should increase p or q.

Correlogram plot (Photo by Panwar Abhash Anil)

7.2.5) Summary statistics:

Prob(Q) is the p-values associated with the null hypothesis that the residuals have no correlation structure.
Prob(JB) is the p-values associated with the null hypothesis that the residuals are Gaussian normally distributed.
If either p-value is less than 0.05, we reject the null hypothesis.

The Box-Jenkins method is a kind of checklist for you to go from raw data to a model ready for production. The three main steps that stand between you and a production ready model are identification, estimation and model diagnostics.

Image by Panwar Abhash Anil

Identication
In this step, we explore and characterize the data to find some form of it which is appropriate to modeling.
• Is the time series stationary?
• What differencing will make it stationary?
• What transforms will make it stationary?
• What values of p and q are most promising?
Identication tools
• Plot the time series df.plot()
• Use augmented Dicky-Fuller test adfuller()
• Use transforms and/or differencing df.diff() , np.log() , np.sqrt()
• Plot ACF/PACF plot_acf() , plot_pacf()
Estimation
This involves using numerical methods to estimate the AR and MA coefficients of the data.
This is automatically done for us when we call the models.fit() method.
• Use the data to train the model coefficients
• Done for us using model.fit()
• Choose between models using AIC and BIC: results.aic , results.bic
Model Diagnostics: In this step, we evaluate the quality of the best fitting model.
• Are the residuals uncorrelated.
• Are residuals normally distributed.
Decision
Using the information gathered from statistical tests and plot during the diagnostic step, we need to make a decision. Is the model good enough or do we need to go back and rework.
If the residuals are not as they should be we will go back and rethink our choices in the earlier steps. If the residuals are okay then we can go ahead and make forecasts.

In this article, we learnt what is time series analysis, what is stationary, how to many time series and stationary, white noise, different time series models, how to find order of models, how to evaluate models, etc.