We are trying binary classification problem for the IT operations data. The data can be loaded here. Following shows the code to read the data file until we split the data to training and testing groups.

import pandas as pdimport tensorflow as tf# read a data file into the python consoledf = pd.read_csv("data.csv")#To print all the features (columns)count = 0;for columns in df:count = count+1#print(columns)print("number of features:", count)print("number of feature instances", df.size)# We are labeling the status with NORMAL error as 1 and all other errors as 0for i in range(0,df.status.size):if df.status[i] == 'NORMAL':df.status[i] = 1else: df.status[i] = 0# We are not considering timestampdf = df.loc[:, df.columns != 'timestamp']# status column is fed as lebels in a seperate columndf_train = df.loc[:, df.columns != 'status']# create labels using status columnlabel = df.statuslabel=label.astype('int')from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(df_train, label, test_size = 0.20, random_state = 42)

Logistic Regression is one of the most simple and commonly used Machine Learning algorithms for two-class classification. It is easy to implement and can be used as the baseline for any binary classification problem.Logistic regression describes and estimates the relationship between one dependent binary variable and independent variables. The dependent variable in logistic regression follows Bernoulli Distribution. Estimation is done through maximum likelihood. Logistic Regression can be used for various classification problems such as spam detection. Diabetes prediction, if a given customer will purchase a particular product or will they churn another competitor, whether the user will click on a given advertisement link or not, and many more examples are in the bucket.

Linear regression gives you a continuous output, but logistic regression provides a constant output. An example of the continuous output is house price and stock price. Example’s of the discrete output is predicting whether a patient has cancer or not, predicting whether the customer will churn. Linear regression is estimated using Ordinary Least Squares (OLS) while logistic regression is estimated using Maximum Likelihood Estimation (MLE) approach.

# Logistic regressionfrom sklearn.linear_model import LogisticRegressionlg = LogisticRegression()lg.fit(X_train, y_train)lg.score(X_test, y_test)

Decision Tree is one of the easiest and popular classification algorithms to understand and interpret. It can be utilized for both classification and regression kind of problem. A decision tree is a flowchart-like tree structure where an internal node represents feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The topmost node in a decision tree is known as the root node. It learns to partition on the basis of the attribute value. It partitions the tree in recursively manner call recursive partitioning. This flowchart-like structure helps you in decision making. The decision tree is a distribution-free or non-parametric method, which does not depend upon probability distribution assumptions. Decision trees can handle high dimensional data with good accuracy. It can be utilized for both classification and regression kind of problem.

# Decision tree classifierfrom sklearn.tree import DecisionTreeClassifierclf = DecisionTreeClassifier().fit(X_train, y_train)clf.score(X_test, y_test)

Random forests is a supervised learning algorithm. It can be used both for classification and regression. It is also the most flexible and easy to use algorithm. A forest is comprised of trees. It is said that the more trees it has, the more robust a forest is. Random forests creates decision trees on randomly selected data samples, gets prediction from each tree and selects the best solution by means of voting. It also provides a pretty good indicator of the feature importance.

Random forests has a variety of applications, such as recommendation engines, image classification and feature selection. It can be used to classify loyal loan applicants, identify fraudulent activity and predict diseases.

Let’s suppose you have decided to ask your friends, and talked with them about their past travel experience to various places. You will get some recommendations from every friend. Now you have to make a list of those recommended places. Then, you ask them to vote (or select one best place for the trip) from the list of recommended places you made. The place with the highest number of votes will be your final choice for the trip.

In the above decision process, there are two parts. First, asking your friends about their individual travel experience and getting one recommendation out of multiple places they have visited. This part is like using the decision tree algorithm. Here, each friend makes a selection of the places he or she has visited so far.

# Random Forestfrom sklearn.ensemble import RandomForestClassifierrf = RandomForestClassifier(n_estimators = 100, random_state = 42)rf.fit(X_train, y_train)rf.score(X_test, y_test)

SVM offers very high accuracy compared to other classifiers such as logistic regression, and decision trees. It is known for its kernel trick to handle nonlinear input spaces. It is used in a variety of applications such as face detection, intrusion detection, classification of emails, news articles and web pages, classification of genes, and handwriting recognition. The classifier separates data points using a hyperplane with the largest amount of margin. That’s why an SVM classifier is also known as a discriminative classifier. SVM finds an optimal hyperplane which helps in classifying new data points. It can be employed in both types of classification and regression problems.

SVM constructs a hyperplane in multidimensional space to separate different classes. SVM generates optimal hyperplane in an iterative manner, which is used to minimize an error. The core idea of SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into classes.

# Support Vector Machinesfrom sklearn.svm import SVCsvm = SVC()svm.fit(X_train, y_train)svm.score(X_test, y_test)

Linear Discriminant analysis is a classification (and dimension reduction) method. It finds the (linear) combination of the variables that separate the target variable classes. It is a supervised machine learning algorithmthat can be used as a classifier for binary or multiclass variable.

LDA tries to find a decision boundary around each cluster of a class. It then projects the data points to new dimensions in a way that the clusters are as separate from each other as possible and the individual elements within a cluster are as close to the centroid of the cluster as possible. The new dimensions are ranked on the basis of their ability to maximize the distance between the clusters and minimize the distance between the data points within a cluster and their centroids. These new dimensions form the linear discriminants of the feature set.

# Linear Discriminant Analysisfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysislda = LinearDiscriminantAnalysis()lda.fit(X_train, y_train)lda.score(X_test, y_test)

K Nearest Neighbor(KNN) is a very simple, easy to understand, versatile and one of the topmost machine learning algorithms. KNN used in the variety of applications such as finance, healthcare, political science, handwriting detection, image recognition and video recognition. In Credit ratings, financial institutes will predict the credit rating of customers. In loan disbursement, banking institutes will predict whether the loan is safe or risky. In political science, classifying potential voters in two classes will vote or won’t vote. KNN algorithm used for both classification and regression problems. KNN algorithm based on feature similarity approach.

In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding factor. K is generally an odd number if the number of classes is 2. When K=1, then the algorithm is known as the nearest neighbor algorithm. It is a non-parametric and lazy learning algorithm. Non-parametric means there is no assumption for underlying data distribution. In other words, the model structure determined from the dataset. This will be very helpful in practice where most of the real world datasets do not follow mathematical theoretical assumptions. Lazy algorithm means it does not need any training data points for model generation. All training data used in the testing phase. This makes training faster and testing phase slower and costlier. Costly testing phase means time and memory. In the worst case, KNN needs more time to scan all data points and scanning all data points will require more memory for storing training data.

# K-Nearest Neighborsfrom sklearn.neighbors import KNeighborsClassifierknn = KNeighborsClassifier()knn.fit(X_train, y_train)knn.score(X_test, y_test)

Naive Bayes is the most straightforward and fast classification algorithm, which is suitable for a large chunk of data. Naive Bayes classifier is successfully used in various applications such as spam filtering, text classification, sentiment analysis, and recommender systems. It uses Bayes theorem of probability for prediction of unknown class.

# Gaussian Naive Bayesfrom sklearn.naive_bayes import GaussianNBgnb = GaussianNB()gnb.fit(X_train, y_train)gnb.score(X_test, y_test)