Alternative Feature Selection Methods in Machine Learning

12 min readApr 25, 2022

You’ve probably done your online searches on “Feature Selection”, and you’ve probably found tons of articles describing the three umbrella terms that group selection methodologies, i.e., “Filter Methods”, “Wrapper Methods” and “Embedded Methods”.

Under the “Filter Methods”, we find statistical tests that select features based on their distributions. These methods are computationally very fast, but in practice they do not render good features for our models. In addition, when we have big datasets, p-values for statistical tests tend to be very small, highlighting as significant tiny differences in distributions, that may not be really important.

The “Wrapper Methods” category includes greedy algorithms that will try every possible feature combination based on a step forward, step backward, or exhaustive search. For each feature combination, these methods will train a machine learning model, usually with cross-validation, and determine its performance. Thus, wrapper methods are very computationally expensive, and often, impossible to carry out.

The “Embedded Methods,” on the other hand, train a single machine learning model and select features based on the feature importance returned by that model. They tend to work very well in practice and are faster to compute. On the downside, we can’t derive…

Alternative Feature Selection Methods in Machine Learning

Written by Sole from Train in Data