Member-only story

Best Resources to Learn Feature Engineering

Sole from Train in Data
13 min readAug 11, 2020

--

Feature engineering for machine learning.
Feature Engineering — Image from the author

Data in its raw format cannot be used straightaway to train machine learning models. Instead, data scientists devote a big chunk of time to transform the data and to build suitable features for machine learning.

The process of transforming the variables and creating new features is called feature engineering, and it is typically the stage where data scientists devote most of their effort in a machine learning project.

As Pedro Domingos said in the article “A few useful things to know about machine learning”:

“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used”.

Feature engineering and data pre-processing are also, for many of us, the most interesting parts of the data science project, where we can combine our creativity and intuition with domain knowledge to create meaningful features.

Some aspects of feature engineering are domain-specific: we need to know a few things about the data and the business area, to derive useful features. But a big chunk of feature engineering is also quite repetitive and can be automated.

Many feature engineering methods are used across organizations and in many data science competitions. These include procedures to remove missing data, encode categorical variables, and extract features from text, to name a few.

More and more, feature engineering practices are being consolidated, and many organizations adopt similar practices to clean and prepare the data.

Surprisingly, even though feature engineering is a crucial part of any machine learning pipeline and also the most time-consuming, it is barely covered in the extensive catalogue of machine learning online courses.

In this article, I will discuss some of the best available resources to learn more about feature engineering.

Let’s crack on with the learning resources.

--

--

Sole from Train in Data
Sole from Train in Data

Written by Sole from Train in Data

Data scientist | instructor (www.trainindata.com)| book author | Python open-source developer. Subscribe 👉: https://www.trainindata.com/p/data-bites

No responses yet

Write a response