In part one we used the “Tips” dataset now we will be using the “penguins” dataset.
Now let’s load the dataset.
d = sns.load_dataset(“penguins”)
d.columns
>> Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex'],dtype='object')
Draw a combination of boxplot and kernel density estimate.
A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual data points, the violin plot features a kernel density estimation of the underlying distribution.
vertical violin plot,
sns.violinplot(data=d,x=”species”,y = “bill_depth_mm”)
sns.violinplot(data=d,x=”species”,y = “body_mass_g”)
split violins to compare the across the hue variable,
sns.violinplot(data=d,x=”species”,y=“body_mass_g”,hue=”sex”,split=True)
Now quartiles as horizontal lines,
sns.violinplot(data=d,x="species",y = "body_mass_g",hue="sex",split=True,inner="quartile")
To Show each observation with a stick inside the violin:
sns.violinplot(data=d,x="species",y = "body_mass_g",hue="sex",split=True,inner="stick")
sns.violinplot(data=d,x=”species”,y = “body_mass_g”,hue=”island”)
We combine violinplot() and FaceGrid() using catplot().
This allows grouping within additional categorical variables. Using catplot()
is safer than using FacetGrid
directly, as it ensures synchronization of variables.
p = sns.catplot(
data = d,x = "species",
y = "flipper_length_mm",hue = "sex",col = "island",
kind = "violin")
A bar plot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars. Bar plots include 0 in the quantitative axis range, and they are a good choice when 0 is a meaningful value for the quantitative variable, and you want to make comparisons against it.
bar plot shows only the mean (or other estimators) value, but in many cases, it may be more informative to show the distribution of values at each level of the categorical variables. In that case, other approaches such as a box or violin plot may be more appropriate.
sns.barplot(data=d,x="species",y="body_mass_g")
sns.barplot(data=d,x="species",y="flipper_length_mm")
sns.barplot(data=d,x=”island”,y=”body_mass_g”)
vertical bars with nested grouping by two variables:
sns.barplot(data=d,x="island",y="body_mass_g",hue = "sex")
median as the estimate of central tendency:
from numpy import median
sns.barplot(data=d,x="species",y="flipper_length_mm",estimator=median)
the standard deviation of observations instead of a confidence interval:
sns.barplot(data=d,x="species",y="flipper_length_mm",ci="sd")
Thank you!