Feature engineering for machine learning: What is it?

Sole from Train in Data
10 min readJul 13, 2020

State-of-the-art feature engineering methods and Python libraries used in data science.

feature engineering for machine learning
Feature engineering for machine learning — Created by the author

Feature engineering is the process of transforming features, extracting features, and creating new variables from the original data, to train machine learning models.

Data in its original format can almost never be used straightaway to train classification or regression models. Instead, data scientists devote a huge chunk of their time to data preprocessing to train machine learning algorithms.

Imputing missing data, transforming categorical data, transforming or discretizing numerical data, are all examples of feature engineering. Feature engineering also involves putting the variables on the same scale, for example through normalization.

Finally, feature extraction from text, transaction data, time series, and images is also key to create input data that can be used to train predictive models.

Feature engineering is key to improving the performance of machine learning algorithms. Yet, it is very time-consuming. Fortunately, there are many Python libraries that we can use for data preparation. These libraries are Pandas, Scikit-learn, Feature-engine, Category Encoders, tsfresh, and Featuretools.

--

--

Sole from Train in Data

Data scientist, book author, online instructor (www.trainindata.com) and Python open-source developer. Get our regular updates: http://eepurl.com/hdzffv