In a nutshell, logistic regression is multiple regression but with an outcome variable that is a categorical dichotomy and predictor variables that continuous or categorical. In pain English, this simply means that we can predict which of two categories a person is likely to belong to given certain other information.

**Example: Will the Customer Leave the Network?**

This example is related to the Telecom Industry. The market is saturated. So acquiring new customers is a tough job. A study for the European market shows that acquiring a new customer is five time costlier than retaining an existing customer. In such a situation, companies need to take proactive measures to maintain the existing customer base. Using logistic regression, we can predict which customer is going to leave the network. Based on the findings, company can give some lucrative offers to the customer. All these are a part of Churn Analysis.

**Example: Will the Customer Leave the Network?**

This example is related to the Telecom Industry. The market is saturated. So acquiring new customers is a tough job. A study for the European market shows that acquiring a new customer is five time costlier than retaining an existing customer. In such a situation, companies need to take proactive measures to maintain the existing customer base. Using logistic regression, we can predict which customer is going to leave the network. Based on the findings, company can give some lucrative offers to the customer. All these are a part of Churn Analysis.

**Example: Will the Borrower Default?**

Non — Performing Assets are big problems for the banks. So the banks as lenders try to assess the capacity of the borrowers to honor their commitments of interest payments and principal repayments. Using a Logistic Regression model, the managers can get an idea of a prospective customer defaulting on payment. All these are a part of Credit Scoring.

**Example: Will the Lead become a Customer?**

This is a key question in Sales Practices. Conventional salesman runs after, literally, everybody everywhere. This leads to a wastage of precious resources, like time and money. Using logistic regression, we can narrow down our search by finding those leads who have a higher probability of becoming a customer.

**Example: Will the Employee Leave the Company?**

Employee retention is a key strategy for HR managers. This is important for the sustainable growth of the company. But in some industries, like Information Technology, employee attrition rate is very high. Using Logistic regression we can build some models which will predict the probability of an employee leaving the organization within a given span of time, say one year. This technique can be applied on the existing employees. Also, it can be applied in the recruitment process. So, we are basically talking about the probability of occurrence or non — occurrence of something.

**The Principles Behind Logistic Regression**

In simple linear regression, we saw that the outcome variable Y is predicted from the equation of a straight line: Yi = b0 + b1 X1 + εi in which b0 is the intercept and b1 is the slope of the straight line, X1 is the value of the predictor variable and εi is the residual term. In multiple regression, in which there are several predictors, a similar equation is derived in which each predictor has its own coefficient.

In logistic regression, instead of predicting the value of a variable Y from predictor variables, we calculate the probability of Y = Yes given known values of the predictors. The logistic regression equation bears many similarities to the linear regression equation. In its simplest form, when there is only one predictor variable, the logistic regression equation from which the probability of Y is predicted is given by:

P(Y = Yes) = 1/ [1+ exp{ — (b0 + b1 X1 + εi )}]

One of the assumptions of linear regression is that the relationship between variables is linear. When the outcome variable is dichotomous, this assumption is usually violated. The logistic regression equation described above expresses the multiple linear regression equation in logarithmic terms and thus overcomes the problem of violating the assumption of linearity. On the hand, the resulting value from the equation is a probability value that varies between 0 and 1. A value close to 0 means that Y is very unlikely to have occurred, and a value close to 1 means that Y is very likely to have occurred.

**Why Can’t We Use Linear Regression?**

One of the assumptions of linear regression is that the relationship between variables is linear. When the outcome variable is dichotomous, this assumption is usually violated. The logistic regression equation described above expresses the multiple linear regression equation in logarithmic terms and thus overcomes the problem of violating the assumption of linearity. On the hand, the resulting value from the equation is a probability value that varies between 0 and 1. A value close to 0 means that Y is very unlikely to have occurred, and a value close to 1 means that Y is very likely to have occurred. Look at the data points in the following charts. The first one is for Linear Regression and the second one for Logistic Regression.