• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

How to utilize Feature Importance correctly

January 1, 2021 by systems

There are several ways to express Feature Importance. It can be score representing the features’ relevancy (using algorithm-based Feature Importance, such as tree-based Random Forest, XGBoost,..), dimension reduction from original features to synthesize all the information to a lower dimension (like Autoencoder or PCA), or domain knowledge selection (aka selection based on your own knowledge). With the variety of choices like this, which one should we choose? Even when we choose to use PCA, selecting the right number of remaining features seems to demand great exertion. Don’t you think we should go with the Feature Importance score or testing everything and see which performs the best?

Yes, it’s correct, testing everything is the accurate answer. From my experience, I usually set the baseline as the original model without any selection, then try PCA with different drop-out ratios and check the metric. Then drop original features (mainly because I want to limit the amount of collected data needed, as well as optimize the training time while maintain or improve the performance). So Dimension Reduction is my first choice and the Feature Importance is the next essential step. But using only the Feature Importance score is hardly a good choice. Let me show you why.

Feature Importance score here means any score generated from the trained model representing the weight or relevancy of the features to the prediction of the target feature.

Below are the feature importance scores of Random Forest calculated based on RandomForestRegressor model (detail of formula is here), selecting feature with SelectFromModel (detail), and permutation score (detail of formula is here). Data used for demonstration is California housing in sklearn.

Filed Under: Machine Learning

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy