One way the machine learns.

The ability to forecast the unknown is a skill in demand. Consider the future. From a business perspective, to make effective decisions today, managers need to know what will likely happen next week, next quarter, or next year. The better we predict, the better we can plan, and the better we can prevent problems from occurring. The forecasted variable could be inventory level for a particular SKU, mortgage interest rate, shipment time, demand for a product, customer churn (loss), housing prices, employee turnover, population size, percentage of quality shipped parts, and many, many other possibilities.

How to create a forecast? Learn how variables with known values relate to the variable with the unknown value to forecast. The intrigue is that a machine can now learn these relationships.

A machine? The word *machine* applied here is a trendy, almost cute, but effective reference to programmed instructions running on a computer.

Machine

: A computer that runs a procedure according to programmed instructions to accomplish a given task, such as learning.

People have always learned. Now machines can learn.

As applied here, learning proceeds from searching for patterns of related information from many existing examples of data. Except instead of you searching for these patterns from the data, the machine does the searching. And guess what? Given its staggeringly massive superiority in data processing speed, the machine can uncover some patterns much more effectively than can people. That is why we need the machine’s help.

Machine learning

: Instruct the machine to identify patterns inherent in data.

For example, as your phone’s computer searches your stored photographs to find those that contain you, the color of your hair helps distinguish you from your friend.

Patterns apply to many types of content. A person’s height, for example, can predict, i.e., forecast, their weight, albeit, as with most prediction, imperfectly. These discovered relationship patterns provide the basis for a specific type of machine learning.

Supervised machine learning

: Methods to develop the best possible forecast of a value of the variable of interest from the related pattern of values with other variables.

The machine expresses its learning as one or more equations that transform the related information into a forecast. Recently developed learning algorithms such as neural networks are complex, sometimes with thousands of equations from which to estimate weights. The most straightforward forecast naturally follows from a single equation, illustrated here.

Model

: A prediction equation that computes the value of the forecasted variable from the values of other variables.

The prediction equation is a recipe that transforms a set of entered numeric values into a forecasted value, as illustrated in Figure 1.1.

To apply the equation, enter the related information, then compute the forecasted value.

Consider an example of an online clothing retailer who must correctly size the garment for a customer not present to verify fit. Returns annoy the customer and vaporize the profit margin for the retailer. Unfortunately, the customer sometimes omits from the online order form crucial measurements needed to fit a garment properly. If the customer omits their weight from the order, the retailer wishes to predict this unknown value from the measurements the customer did provide.

Both a person’s height and chest size relate to their weight. Leveraging these relationships, the retailer’s analyst first specifies a general form of the model. One popular choice is a model defined as a weighted sum of the variables, here height and chest size, plus a constant, what is called a *linear* relationship. What are the weights that optimize forecasting accuracy of a customer’s weight? That discovery is the machine’s job.

For simplicity, this initial model derived from the data analysis in this example applies only to their male customers.

Predicted Weight=3.80(Height)+7.32(Chest)−386.22

From examining the existing data values for each male customer’s height, chest size and weight, the machine learned the underlying weighted sum relationship. The machine estimated values of 3.80 and 7.32 for the respective weights of the two variables, plus the constant term of -386.22.

To apply the machine’s prediction model to a specific customer’s height and chest measurements, enter the corresponding data values, the measurements, into the equation in place of the variable names. Suppose the customer reports a height of 68 inches and a chest measurement of 35 inches, illustrated in Figure 1.2.

The forecasting model computes a specific number, the forecast, from the specific values for each of the variables entered into the prediction equation. For a person 66 inches tall with a chest measurement of 36 inches, forecasted weight is 128.10 lbs.

The forecasting process begins with the choice of variable to forecast.

Target

: The variable with the values to forecast.

Generically refer to the target variable as *y*, which assumes a specific variable name in a specific application, such as customer weight in the previous example. In traditional statistics, refer to the same *y* as the *response variable* or *dependent variable*, among other names.

From what information does the forecast of the value of the target, *y*, proceed? In the online clothing retailer example, enter a customer’s height and chest size into the model to calculate the forecast.

Feature

: One of the one or more variables from which to predict the value of the target variable.

Generically refer to the variables from which to compute the forecast as X, uppercase to denote that typically X consists of more than one feature: x1, x2, etc. Traditional references to the X variables include *predictor variables*, *independent variables*, or *explanatory variables*.

The variables of interest for supervised machine learning models span a wide variety of applications. Table 1.1 lists some business forecasting scenarios, each with three predictor variables. Some of the target variables are continuous, measured on a quantitative scale. The values of other target variables in Table 1.1 are *labels*, where each label defines a single category.