Recursive feature elimination with Python

Sole from Train in Data
10 min readAug 16, 2022

Recursive feature elimination (RFE) is the process of selecting features sequentially, in which features are removed one at a time, or a few at a time, iteration after iteration.

Given a machine learning model, the goal of recursive feature elimination is to select features by recursively considering smaller and smaller sets of features.

In RFE, first an estimator is trained using all features, and then the importance of each variable is obtained. If using Scikit-learn, these would be obtained either from the coefficients of a linear regression model ( coef_) or the importance derived from decision trees ( feature_importances_). Then, the least important feature or group of features would be removed, and a new machine learning model would be trained using the remaining features.

RFE initial steps:

  1. Train a machine learning model
  2. Derive feature importance
  3. Remove least important feature(s)
  4. Re-train the machine learning model on the remaining features
Recursive feature elimination, initial steps

After this point, there are 2 different implementations of RFE. One by Scikit-learn and one by Feature-engine. Let’s see what they are about.

For tutorials on feature selection, check out our course Feature Selection for Machine Learning or our book Feature Selection in Machine Learning with Python.

RFE in Scikit-learn

In the Scikit-learn implementation, features continue to be removed based on feature importance. That means that steps 2–4 are repeated until a stopping criteria is reached. In Scikit-learn, the stopping criteria is an arbitrary number of final features.

RFE in Scikit-learn

  1. Train a machine learning model
  2. Derive feature importance
  3. Remove least important feature(s)
  4. Re-train the machine learning model on the remaining features
  5. Repeat 2 to 4 until the desired number of features is reached
Sole from Train in Data

Data scientist, book author, online instructor (www.trainindata.com) and Python open-source developer. Get our regular updates: http://eepurl.com/hdzffv