Data visualization is an integral part of data science. It is quite useful in exploring and understanding the data. In some cases, visualizations are much better than plain numbers at conveying information as well.
The relationships among variables, the distribution of variables, and underlying structure in data can easily be discovered using data visualization techniques.
In this article, we will go over 5 fundamental data visualization types that are commonly used in data analysis. We will be using the Altair library which is a statistical visualization library for Python.
I previously wrote similar articles with Seaborn and ggplot2 if you prefer one of those libraries for data visualization tasks. I suggest to go over all because comparing different tools and frameworks on the same task will help you learn better.
Let’s first create a sample dataframe to be used for the examples.
import numpy as np
import pandas as pddf = pd.DataFrame({
'date':pd.date_range(start='2020-01-10', periods=100, freq='D'),
'cat':pd.Series(['A','B','C']).sample(n=100, replace=True),
'val':(np.random.randn(100) + 10).round(2),
'val2':(np.random.random(100) * 10).round(2),
'val3':np.random.randint(20,50, size=100)
})df = df.reset_index(drop=True)df.head()
The dataframe consists of 100 rows and 5 columns. It contains datetime, categorical, and numerical values.