When an email lands in your inbox, how does your email service know whether it’s a real email or spam? This evaluation is made millon’s of times per day and there is one of the way it can be done is with Logistic Regression.
Logistic Regression is a supervised learning machine learning algorithm that uses regression to predict the continuous probability, ranging from 0 to 1, of a data sample belongs to a specific category or class. Based on that probability, the sample is classified as belonging to the most probable class.
In our spam filtering example, a Logistic Regression algorithm predict the probability of the incoming email being spam. If the predicted probability of email is equal to 0.5 or greater than, then it will be classified as spam ( positive class ) with label 1. On the other hand if the predicted probability of email is less than 0.5 is classified as ham (a real email). We would call ham the negative class, with label 0. The act of dealing, with this type of data have two classes are called as binary classification.
Some other example of what we can classify with Logistic Regression include :
- Disease survival — Will a patient, 5 years after treatment for a disease, still be alive?
- Customer conversion — Will a customer arriving on a sign-up page enroll in a service?