Scaling for a double y axis plot with scikit-learn

Many machine learning algorithms work better when features are on a relatively similar scale and close to normally distributed. MinMaxScaler, RobustScaler, StandardScaler, and Normalizer are scikit-learn methods to preprocess data for machine learning. Which method you need, if any, depends on your model type and your feature values.

These scalers not only are valuable for modeling but also when plotting multiple y axis.

When plotting a double y axis comparing numerical data for the Nasdaq stock index price and US covid19 cases numbers I chose the MinMax Scaler. As the data was sensitive to analyzing any radical shifts in price or case numbers, I kept the outliers and hence I chose the Min Max Scaler on the double y axis because:

It doesn’t reduce the importance of outliers.

For each value in a feature, MinMaxScaler subtracts the minimum value in the feature and then divides by the range. The range is the difference between the original maximum and original minimum.

MinMaxScaler preserves the shape of the original distribution. It doesn’t meaningfully change the information embedded in the original data.

The default range for the feature returned by MinMaxScaler is 0 to 1.

Here’s the plot after MinMaxScaler has been applied:

Footer