When building a machine learning model for a business problem, it’s rare that all the variables in the data will need to be incorporated in the model. Sure, adding more variables rarely makes a machine learning model less accurate, but there are certain disadvantages to including an excess of features.
In this article, I discuss the importance of feature selection in machine learning. I highlight why we should select features when using our models for business problems. And then go over the main feature selection algorithms.
What we’ll cover:
- What is feature selection in machine learning?
- Importance of feature selection in machine learning
- Feature selection methods: filter, wrapper, embedded and hybrid
Let’s get started.
What is feature selection in machine learning?
Feature selection is the process of identifying and selecting a subset of variables from the original data set to use as inputs in a machine learning model.
A data set usually contains a large number of features. We can employ a variety of methods to determine which of these features are actually important in making predictions.
Each of the different methods have advantages and disadvantages to consider. But why should we select features to begin with?
Importance of feature selection in machine learning
At a glance, it may seem that the more information one feeds to the machine learning model, the more accurate it will be. However, simpler models are more effective in a variety of ways, particularly when used in an organization.