Best Resources to Learn Feature Engineering
Data in its raw format cannot be used straightaway to train machine learning models. Instead, data scientists devote a big chunk of time to transform the data and to build suitable features for machine learning.
The process of transforming the variables and creating new features is called feature engineering, and it is typically the stage where data scientists devote most of their effort in a machine learning project.
As Pedro Domingos said in the article “A few useful things to know about machine learning”:
“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used”.
Feature engineering and data pre-processing are also, for many of us, the most interesting parts of the data science project, where we can combine our creativity and intuition with domain knowledge to create meaningful features.
Some aspects of feature engineering are domain-specific: we need to know a few things about the data and the business area, to derive useful features. But a big chunk of feature engineering is also quite repetitive and can be automated.