All About Machine Learning YOU Need to Know (Simplified)

Have you ever met Sophia the AI robot? Or held a chat with Cortana and Siri? Ever noticed after visiting certain websites or text messaging about a certain product, you start getting more advertisements regarding it on google? They are all examples of Machine Learning Applications. So what is Machine Learning and How does it work? What are its types? How and Where to Learn, most importantly what are the prerequisites of learning it.

If that is what you are wondering about, then you are in the right place, so let us dive into it.

Machine learning is a subcategory of Artificial Intelligence that focuses on, as the name suggests ‘Learning’ of the Machine to be able to perform tasks. You feed in the data to a model and then get an output based on that. (As a beginner you can select data from online available datasets to practice upon, using them will save you the trouble of creating your data for input.)

Note that the models use loads of data to operate efficiently, a little data will cause you errors. (Confused? Don’t worry I will elaborate in a while.)

Remember that Deep Learning, Machine Learning, and Artificial Intelligence are not to be confused, Deep Learning is a subbranch of Machine Learning, which in turn as stated above is a subbranch of Artificial Intelligence,

DataSet: The data you will be working with to train the model.

Model: Machine Learning uses models that are the ‘Tools’ you train according to your desired application.

Training and Testing Dataset: To check for the efficiency of your machine learning model you split the data set into two categories, namely training and testing datasets, the former is the dataset you use to train your model while later one is used to check the accuracy of the trained model

Trained Model: That is just the model after you train it.

Learning: Training the model using sample matrix on the target variable for prediction.

Predicting: predicting the value of the target variable based on the sample data used for training

Sample Matrix: Matrix containing (n_samples,n_features)

n_sample and n_features: Former are the rows in a data frame, later are the columns or whatever attributes you provide for the certain object under consideration, are your features and instances and sample values for those features are your samples.

Target variable: That is the variable or array you are going to train the model for and then predict the values of this variable. or The variable you target to predict.

The procedure follows four steps

Extracting the Data
Preprocessing the Data
Training the Model
Validating the Model

Extracting The Data

As a learner, you can create a dataset for yourself or get it online. Since the models require a lot of data to train upon, as a fact, the more the data the accurate the results, getting the datasets online is a sane choice because creating one would prove hectic and time causing. Some of the online data providers, where it is legally available are:

Kaggle datasets
Amazon’s AWS datasets
Wikipedia’s list of Machine Learning datasets
Quora.com question
Datasets subreddit
http://dataportals.org/
http://opendatamonitor.eu/
http://quandl.com/

Preprocessing Data

The Data is NOT always present in the desired form i.e ready to use. So the data has to be transformed into a desirable shape before feeding into the model and that is what you call preprocessing. The factors that make data unfit for training include, Outliers, data not distributed normally, Missing values, or if the data is in the form of text or pictures because models only understand numbers.

For these purposes and others data has to be cleaned, also note that models do not accept data frames instead you have to convert them into arrays.

Training the Model

This is done by different types of machine learning models, depending upon the task you have to perform, you may choose the desired model. types of these models are discussed in detail in the later section.

Validating the model

That is testing the model and checking for errors and removing them before launching it.

There are three main categories for the classification of machine learning types, which are further divided into subcategories:

Based on Human Supervision (Supervised, Unsupervised, Reinforcement Learning)
Ability to learn Incrementally (Online, Batch Learning)
Work on Known dataPoints or be predictive( Instance-based or Model-Based Learning)

Here we will discuss mainly Supervised Learning, and also explain what Unsupervised and Reinforcement types are.

Human Supervision is present, Desired columns or features are labeled for the instances or samples, based on the previous statistics the new values are predicted an example is of the marketing and selling sectors where visiting customers generate useful data to be analyzed and predicted.

Main Types of Supervised Learning

Classification
Regression

In classification, the model maps the new data into classes based on the labeled classes provided to it

Models used for Classification

Decision Trees
Naive Bayes classification
Support Vector Machines (SVMs)

In the case of regression Based on labeled datasets model predicts a continuous-valued output.

Models used for Regression

Linear regression
Logistic Regression

The training data is unlabeled, the model has to figure out and classify or group the same type of instances. For example, what Speech recognition software does, they group each type of voice and label them as a person 1 or person 2 based on how they are required or instructed.

Unsupervised Learning Algorithms:

Clustering
Visualization and dimensionality reduction
Association rule learning

The Agent (Our model in this context) observes the surroundings, then decides and performs actions, is rewarded or imposed with a penalty on basis of good or bad performance, yes you are right, we treat the model as a pet dog.

Machine learning uses Python as a language and main libraries include:

This section answers how and where to learn Machine Learning. It is simple to start Machine Learning, you just need to know the right steps to follow, you can follow those steps based on the resources provided here, they are informative and fun. There are several resources available online, which I shall list down.

Scikit-Learn provides a User Guide and tutorials, click on the name to visit the website.
Dataquest
Kaggle
Hands-on Machine Learning with Scikit Learn and TensorFlow by Aurelien Geron (available at pdfdrive.com)

There are many other tutorials and courses available online at platforms such as Coursera and on Youtube as well, you can visit Edureka, which also provide courses on the subject.

Getting started with Machine Learning requires prior knowledge of certain subjects, such as basic college mathematics and programming in python along with some of its libraries.

Basic mathematical subjects you need to have a good understanding of are:

Calculus
Linear Algebra
Probability
Statistics

Libraries of python to gain familiarity with:

Numpy
Matplotlib
Pandas
Scipy
Seaborn

The first three are a must, however, the latter two are not that necessary, Seaborn too like Matplotlib is a library for data visualization and is much more advanced, but knowing Matplotlib only would do for you as a beginner.

It is hoped that you find the above information useful feel free to contact me via email provided incase of queries. Happy Machine Learning.

Extracting The Data

Preprocessing Data

Training the Model

Validating the model

Main Types of Supervised Learning

Footer