When a variance is very low or close to zero, the attribute can be nearly constant and give very low influence on a model. Therefore, dropping low variance attribute is important. Here, we remove all attributes that give variance result of 0.
x = train.drop(['isFraud','TransactionID','TransactionDT'],axis=1)y = train.isFraudvarience = x.var().sort_values()x.drop(list(varience[varience==0].index),axis=1,inplace=True)
The speed and efficiency of modelling depends highly on the dimension of data. Hence, the highly correlated attributes are which may affect the model in terms of stability are cleaned out.
cormatrix = x.corr()tr = cormatrix.where(np.triu(np.ones(cormatrix.shape),k=1).astype(np.bool))f, ax = plt.subplots()sns.heatmap(cormatrix,cmap='BrBG')
drop_ = [x for x in tr.columns if any(tr[column] > 0.4)]x.drop(drop_,axis=1,inplace=True)#Standardize
we managed to reduce the data to (8000, 23).