• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Machine Learning: Iris Data Set

February 23, 2021 by systems

Presentation of the Iris dataset

The Iris dataset is a famous multivariate dataset that was first introduced by the statistician and biologist Sir R.A Fisher in 1936 in a research paper. It collects the data to segregate three species of Iris flowers based on their morphologic variation. The data consists of 50 samples from each of the three species (150 total: 50 Iris Setosa, 50 Iris Virginica and 50 Iris Versicolor) with four features or characteristics measured from each sample (length of the sepals, width of the sepals, length of the petals and width of the petals).

The Iris Flower dataset

The use case is simple, a botanist is trying to determine the species of an iris flower based on the four features described above. Classifying this dataset will allow us to have a good introduction and understanding of solid concepts of machine learning.

Getting to know our dataset

We will be using a Python library called sklearn (https://scikit-learn.org/), this is a great tool for classification and a good option for our specific use case. So the first thing we need to do is install this library as follow:

Run this command: $ pip install scikit-learn

The dataset is composed of four features (called data)and a target as described below by taking a look at the data from 0 to 5, 51 to 56, 101 to 106:

Example of data from Iris set

A better way to understand our data is to use a library called matplotlib.pyplot to plot it and visualize it, but first lets go ahead and load our data. The dataset is provided by sklearn library, so we will go ahead and import it then load the dataset.

Once we run the code, we get this plot:

Example of petal width and length plot

Similarly, if we modify plt.scatter(, , c=irisDataSet.target) to plot all the different combination we get the following:

Iris dataset Scatterplot

Next Step: Classification

Now that we have a better idea about our data, lets go ahead and use a Classification model to classify the species of Iris flowers. In the next article, we go over the K-Nearest Neighbor Classifier and apply it to this dataset.

Resources

Iris flower data set. (2021, January 20). Retrieved January 28, 2021, from https://en.wikipedia.org/wiki/Iris_flower_data_set

R.A. Fisher. “The use of multiple measurements in taxonomic problems. Annals of Eugenics”, 7(2), 179–188 (1936)

Filed Under: Artificial Intelligence

Primary Sidebar

Carmel WordPress Help

Carmel WordPress Help: Expert Support to Keep Your Website Running Smoothly

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy