Data visualization helps to tell stories by curating data into a form that is easier to understand. Python provides a built-in library matplotlib
for easier visualization of plots.
In this article, I will be explaining 5 of the most common basic plots used for visualization. I will be using the heart disease prediction dataset from Kaggle for providing some examples. The link for the dataset has been attached below for your future practice.
Link: https://www.kaggle.com/ronitf/heart-disease-uci
Before moving on to the analysis of the dataset, first, you need to import all the required libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Now, let’s import the dataset using the read_csv()
function in pandas.
data = pd.read_csv("heartdisease.csv")
data.head()
print(data.shape)(303, 14)
See that the dataset contains 14 columns and 303 rows. We will be plotting line plots, scatter plots, box plots, bar plot, and heatmap between different columns to find the relations.
Let’s start our discussion with the line plot.
A line plot is one of the most basic data visualization techniques. In line plots, we will be plotting one variable as a function of another variable.
For plotting a line plot, use the plt.plot()
function with the x, y values as parameters to the function.
I will be defining an artificial variable x ranging from 0–10 and try plotting sin(x)
.
x = np.linspace(0,10,20)
# the above statement creates an numpy array of 20 points at equal intervals ranging from 0 to 20
y = np.sin(x)
plt.plot(x,y,color = 'blue')
# you can also change the color using the `color ` argument
A Scatter plot represents the values of 2 variables using dots. It can be considered as a line plot but the points are connected to each other. I will be using the same data as I have used for a line plot.
x = np.linspace(0,10,20)
# the above statement creates an numpy array of 20 points at equal intervals ranging from 0 to 20
y = np.sin(x)
plt.scatter(x,y)
So why do you need a scatter plot if there is not much difference with the line plot?
A scatter plot provides more flexibility to modify the color, shape, and size of each dot. You can use parameters like c
,cmap
for better visualization.
x = np.linspace(0,10,20)
# the above statement creates an numpy array of 20 points at equal intervals ranging from 0 to 20
y = np.sin(x)
plt.scatter(x,y, c=y,cmap='Blues')
See that the color intensity of the dots increases with the increase in a y value.
A histogram is used to find the shape and spread of the sample data. The x-axis represents the ranges and the y-axis represents the frequency. Use plt.hist()
function with the x-value and the no. of bins you need an argument to the function.
plt.hist(data['age'],bins=20)
plt.xlabel("Age")
plt.ylabel("Frequency")
A bar plot is used to represent categorical data with their values proportional to the height of the bars.
Use plt.bar()
function with x and y values as parameters to do this.
x = ['A','B','C','D','E']
y = [110,120,40,190,70]
plt.bar(x,y,color='blue')
A box plot is used to graphically depicting groups of numerical data through their quartiles.
Use plt.boxplot()
function with the x value as an argument to do this.
plt.boxplot(data['age'])
The bottom line shows the minimum point(other than outliers) and the top line shows the maximum point(other than outliers).
The bottom line in the box shows the first quartile point, the middle line represents the median and the top line in the box represents the 3rd quartile value.
plt.xlabel()
are used to label x-axis.plt.ylabel()
is used to label the y-axis.plt.title()
is used to set the title for the plot.
x = np.linspace(0,10,20)
y = np.exp(x)
plt.plot(x,y,color = 'blue')
plt.xlabel("X-VALUE")
plt.ylabel("SIN-VALUE")
plt.title("sin GRAPH")Text(0.5, 1.0, 'sin GRAPH')
4. plt.style.use()
to change the style of the plot.
plt.style.use('seaborn-whitegrid')
plt.plot(x,y,color = 'blue')
plt.xlabel("X-VALUE")
plt.ylabel("SIN-VALUE")
plt.title("sin GRAPH")Text(0.5, 1.0, 'sin GRAPH')
5. plt.rcParams.update()
to change the figure size of the plot.
params = {'figure.figsize': (8, 5),}
plt.rcParams.update(params)
plt.style.use('seaborn-whitegrid')
plt.plot(x,y,color = 'blue')
plt.xlabel("X-VALUE")
plt.ylabel("SIN-VALUE")
plt.title("sin GRAPH")Text(0.5, 1.0, 'sin GRAPH')
Thanks for reading through the article. Follow us to get notified of future content like this.