We now have the dataset saved in a proper data structure. Let’s start with creating a scatter plot.
Scatter plot is a relational plot which is commonly used to visualize the values of two numerical variables. We can observe if there is a correlation between them.
Seaborn:
sns.relplot(data=titanic, x="Age", y="Fare", hue="Survived",
kind='scatter', aspect=1.4)
The relplot function of Seaborn creates different kinds of relational plots such as scatter plot or line plot. The type of plot is specified with the kind parameter. We pass the columns to be plotted on x axis and y axis to x and y parameters, respectively. The hue parameter separates the data points based on the categories in the given column by using different colors for each category. Finally, the aspect parameter adjusts the width-height ratio of the figure.
Ggplot2:
> ggplot(data = titanic) +
+ geom_point(mapping = aes(x = Age, y = Fare, color =
Survived))
The first step is the ggplot function that creates an empty graph. The data is passed to the ggplot function. The second step adds a new layer on the graph based on the given mappings and plot type. The geom_point function creates a scatter plot. The columns to be plotted are specified in the aes method. The color column is same as the hue parameter in Seaborn library.
We do not observe a distinctive relationship between age and fare which is kind of expected.
We use the color parameter to separate data points based on the survived column. It seems like the passengers who pay more have higher chance to survive.