• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

A Complete Introduction To Time Series Analysis (with R):: Gaussian Time Series

January 23, 2021 by systems

Hair Parra
The likelihood for a Gaussian Time Series

In the last two articles, we saw a number of methods to independently estimate AR(p) and MA(q) coefficients, namely the Yule-Walker method, Burg’s Algorithm, and the Innovations Algorithm, as well as the Hannan-Risennan Algorithm, to jointly estimate ARMA(p,q) coefficients, by making use of initialized AR(p) and MA(q) coefficients with the previous algorithms. We also mentioned, that these methods, as sophisticated as they are, tend to perform quite poorly when it comes to dealing with real datasets, as it’s easy to misspecify the true model. Therefore, we would like to yer introduce another assumption: normality of the observations. These are called Gaussian Time Series.

Gaussian Time Series

Then its likelihood follows a multivariate normal density given by

where

Note that Gamma_{n} is indeed a function of the parameters that we are trying to maximize (that’s it, the phi’s for the AR(p) part, the thetas for the MA(q), and the sigma squared for the common variance). This follows as K(i,j) is indeed a function of the autocovariance function, which in turn contains such parameters, depending on the actual model. Therefore, we would like to be able to maximize this likelihood to obtain the most suitable set of parameters of the model. However, the presence of the Gamma_n matrix presents two huge problems: calculations and maximizations of its inverse and determinant are extremely computationally costly and do not provide a “friendly” gradient either.

Innovations Algorithm for Gaussian Time Series

So, how can we actually solve this issue? Once again, the Innovations Algorithm comes to the rescue. By considering the Innovations instead of the actual observations by themselves, we can express the previous likelihood as

where

are functions of the phi and theta parameters, but not of sigma squared.

Proof

be the coefficients from the Innovations Algorithm.

is the prediction error for the j-th observation.

  • Recall the matrix C_{n}, defined as
  • Recall as well that form the Innovations algorithm, we also have that

, that is, Innovations are uncorrelated, which implies that

  • This further implies that the vector of innovations

has a diagonal covariance matrix, given by

where the diagonal entries come directly from the Innovations algorithm, since

That is, these are given by the recursive formula

Using these facts, we have that

, from which follows that

That is, we can replace in the likelihood:

For the determinant, we have that

So that we can then write down the likelihood as

But wait! Although we have indeed achieved to express the likelihood in a rather nice form, we still cannot optimize for the sigma square parameter independently. Recall that for the ARMA(p,q), the Innovations algorithm defines W_{t} as

, from which it follows that

, so that

Therefore, we can conclude that

Note: Recall that for the ARMA(p,q) model, K{i,j} is given by

, from which you can more clearly see the dependence on the parameters. See my article on Innovations Algorithm for ARMA(p,q) models.

Gaussian MLE estimates of ARMA(p,q) model

Now the question is: how do we optimize such likelihood? It turns out that we can find MLE estimates of the form

and,

Proof

We can perform some simple algebraic manipulation as follows. First, we apply the natural logarithm to the likelihood, obtaining

Solving w.r.t. sigma square yields the respective estimate. Now, plugging this back into the likelihood yields:

Ignoring the constants, this implies that the MLE estimates of the phi and theta parameters are given by

Note

  • Consider the SS_{Res} (Sum of Squared Residuals)

Inside the summation, the numerator is the square prediction error given by the squared difference between each observation and its estimate given by the suitable BLP under some parameters phi’s and theta’s. This is then normalized by the MSE of that prediction.

  • What makes an innovation “unlikely”?

Note that

Therefore, we see that this natural normalization is necessary because simply considering the innovations would give too much weight to early observations. By normalizing, they are on “equal footing”.

the second term can be seen as a kind of geometric mean regularization, giving an extra penalty on the likelihood for large values of r_{j}.

  • Note that the optimization problem above has no closed-form solution. However, we can use numerical optimization algorithms such as gradient descent or Newton-Raphson. These algorithms, however, require initial values for the parameters in question; therefore choosing “good starting values” can yield better estimates and ensure faster convergence. Some good options are estimates produced by other algorithms, such as Burg’s algorithm and Hannan-Rissanen’s.
  • Before modern optimization methods, people used to perform lazy optimization, in which the optimization objective was simplified to

This way, it can be solved analytically, producing the estimates

However, these estimates often produce non-causal and/or non-invertible solutions, without placing additional constraints, which would yield once again a similar objective to the one we originally presented.

Asymptotic Normality of Gaussian MLE Estimates

Now let’s just see a couple of normality properties of the resultant coefficients. These are useful in obtaining things like confidence intervals for the parameter values, for instance. Let

be the true parameters from the MLE estimates in the previous section. Then, for large n

where

for

for

In practice, we can approximate the variance matrix by the hessian of the parameters, given by

where

This is convenient because the Hessian is usually calculated as a part of some optimization algorithms.

Last time

Estimation of ARMA(p,q) Coefficients (Part II)

Next Time

That’s it! Next time, we will learn about Model Selection for ARMA(p,q) models, and finally see some comprehensive examples on applied time series analysis with ARMA(p,q) models in R. Stay tuned, and happy learning!

Main page

Follow me at

  1. https://blog.jairparraml.com/
  2. https://www.linkedin.com/in/hair-parra-526ba19b/
  3. https://github.com/JairParra
  4. https://medium.com/@hair.parra

Filed Under: Machine Learning

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy