Feature engineering for machine learning: What is it?

Sole from Train in Data
10 min readJul 13, 2020

State-of-the-art feature engineering methods and Python libraries used in data science.

feature engineering for machine learning
Feature engineering for machine learning — Created by the author

Feature engineering is the process of transforming features, extracting features, and creating new variables from the original data, to train machine learning models.

Data in its original format can almost never be used straightaway to train classification or regression models. Instead, data scientists devote a huge chunk of their time to data preprocessing to train machine learning algorithms.

Imputing missing data, transforming categorical data, transforming or discretizing numerical data, are all examples of feature engineering. Feature engineering also involves putting the variables on the same scale, for example through normalization.

Finally, feature extraction from text, transaction data, time series, and images is also key to create input data that can be used to train predictive models.

Feature engineering is key to improving the performance of machine learning algorithms. Yet, it is very time-consuming. Fortunately, there are many Python libraries that we can use for data preparation. These libraries are Pandas, Scikit-learn, Feature-engine, Category Encoders, tsfresh, and Featuretools.

In this article, we will answer the following questions:

  • Why do we engineer features for machine learning?
  • What are the main feature engineering techniques?
  • How can we do feature engineering with Python?

Let’s get started.

If you want to know more about feature engineering in machine learning:

Check out our Course Feature Engineering for Machine Learning.

Check out our Python Feature Engineering Cookbook.

Why do we engineer features for machine learning?

Features in the raw data are almost never suitable inputs for machine learning models. Instead, data scientists need to fill in missing data, transform categorical features into numbers, create new variables, and much more.

--

--

Sole from Train in Data

Data scientist, book author, online instructor (www.trainindata.com) and Python open-source developer. Get our regular updates: http://eepurl.com/hdzffv