Let’s say your data is scaled properly by using either of the following techniques: Standardization where scaled values are centered around the mean with a unit standard deviation. This means that the mean of the attribute is zero and the resultant distribution has a unit standard deviation. Normalization where values are shifted and rescaled so that they end up ranging between 0 and 1 which is also known as Min-Max scaling.
The model is considered accurate when it operates on training and test data with the highest precision in exactly the same way. The cause of poor performance is either overfitting or underfitting the data in a machine learning model.
High variance while training indicates that the model is showing ‘underfitting’. Model is not biased towards training data hence, not able to fit the data points well and hence generating a high variance in training data.
[Q] How to deal with this issue? _________________________________________________________________ [1] What if? — Increase the number(rows) of training samples
NO, If there is a problem with the model, an increase in training data may not help. It may again increase the training error. _________________________________________________________________
[2] What if? — Add more features(columns) to training data
YES, Adding more features may help if your model is giving more importance to irrelevant features which it is already using while training. The data features that are already present are not that informative so either replace them or add more relevant features
_________________________________________________________________ [3] What if? — Recleaning data
YES, Having clean data will ultimately increase overall productivity and allow for the highest quality information in decision-making. Removal of errors always helps in some case like when multiple sources are contributing to single dataset. _________________________________________________________________
[4] What if? — Increase the power of the algorithm
YES, We can increase the power of the algorithm or model by kernelization or we can replace a model with another powerful model that fits training data really well. _________________________________________________________________ [5] What if? — Analysis of outliers
YES, Outliers are the data points are either too high or too low in value, such that they do not belong to the general distribution of the rest of the dataset so it is always better to do outlier detection unless your model is robust to handle outliers. _________________________________________________________________ [6] What if? — Apply Boosting
YES, Boosting will increase model complexity and hence will help in the decrease in bias. _________________________________________________________________ [7] What if? — Apply Bagging
NO, Bagging decreases variance if we observe any high variance during testing, so this may not help. _________________________________________________________________ [8] What if? — Apply log Transformation before model training
May be YES, If our data is highly skewed and we apply ‘log transformation’ to make it normally distributed, it may decrease variance while training but it may not be helpful. Because sometimes results of standard statistical tests after performing ‘log transformation’ on data are NOT RELEVANT if you compared it with non-transformed data. Most researchers do not deal with the skewed data, but they apply new methods that are independent of distributions like GEE ( Generalized Estimating Equations) or Malhanobis distance while calculating distance in non-normalized or non-standardized data. _________________________________________________________________ [9] What if? — Apply SMOTE to generate more data samples
May be YES, As we discussed, adding more data may not be helpful, but SMOTE may be helpful if we recover from outliers by adding more synthetic data points which makes data outlier free. _________________________________________________________________ [10] What if? — Reduction in regularization parameter
YES, If we introduce more regularization, it will increase more bias which may cause under-fitting, but the reduction in regularization parameter may help in the decrease in bias hence will reduce under-fitting. _________________________________________________________________ [11] What if? — Introduce Dropouts in Neural Network
NO, In case of under-fitting, skipping some neurons with probability P will not help as it will decrease model complexity and will introduce more bias in training. _________________________________________________________________ [12] What if? — Increase folds in cross-validation
NO, If your model is under-fitting and you applied more folds in cross-validation then it may not be helpful as it will just increase data points while training a model but will fail to increase the number of features or power of the model. _________________________________________________________________ [13] What if? — We train the model for Longer Time
May be YES, Since underfitting means less model complexity, training longer can help in learning more complex patterns. This is especially true in terms of Deep Learning. _________________________________________________________________