## Introduction

By Definition:

*A parsimonious model is a model that accomplishes the desired level of explanation or prediction with as few predictor variables as possible.*

*The goodness of fit of a statistical model describes how well it fits a set of observations.*

Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question.

The idea behind parsimonious models stems from *Occam’s razor*, or “*the law of briefness*” (sometimes called *lex parsimoniae* in Latin). The law states that you should use no more “things” than necessary; In the case of parsimonious models, those “things” are parameters. Parsimonious models have optimal parsimony or just the right number of predictors needed to explain the model well.

There are generally two ways of evaluating a model: **Based on predictions and based on goodness of fit on the current data**. In the first case, we want to know if our model adequately predicts new data, in the second we want to know whether our model adequately describes the relations in our current data. These are two different things.

## Comparing the Models

There is generally a trade-off between goodness of fit and parsimony: low parsimony models (i.e. models with many parameters) tend to have a better fit than high parsimony models. This is not usually a good thing, as adding more parameters usually results in a good model fit for the data at hand, but that same model will likely be useless for predicting other data sets. Finding the right balance between parsimony and goodness of fit can be challenging.

## Model Selection Approaches

Model selection can follow three approaches:

E**valuating based on predictions:**

The best way to evaluate models used for prediction is cross-validation. Very briefly, we cut our dataset into say, 10 different pieces, use 9 of them to build the model and predict the outcomes for the 10th dataset. A simple mean squared difference between the observed and predicted values gives us a measure for the prediction accuracy. As we repeat this 10 times, we calculate the mean squared difference over all 10 iterations to come to a general value with a standard deviation. This allows us again to compare two models on their prediction accuracy using standard statistical techniques (t-test or ANOVA).

E**valuating based on goodness of fit:**

This approach differs depending on the model framework we use. For example, a likelihood-ratio test can work for Generalized Additive Mixed Models when using the classic gaussian for the errors but is meaningless in the case of the binomial variant.

We have the more intuitive methods of comparing models, like the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) to compare the goodness of fit for two models. Other methods like Mallow’s Cp criterion, Bayes Factors, Minimum Description Length (MDL) etc. are also popular.

Let’s explore some of these methods:

**Akaike Information Criterion:**

Akaike’s information criterion (AIC) compares the quality of a set of statistical models to each other. If we have a number of models to compare, the AIC will take each model and rank the models from best to worst. The best model will be the one that neither over-fits nor under-fits. The basic formula for the AIC is: