Selecting Features with the Population Stability Index
If you worked in credit scoring, you probably heard about the Population Stability Index, or PSI.
The PSI is a metric that quantifies the changes in a variable distribution and it is commonly used to assess the risk of using a variable in a credit risk model.
Unstable features are variables whose distribution changes after certain events, for example due to policy changes or a recession. Unstable features may affect the model performance. Thus, we want to avoid utilizing unstable features in credit risk scorecards.
In this article, we will discuss what the Population Stability Index is, its uses in credit scoring and data science, and how we can select features based on their PSI values with the Python open-source package Feature-engine.
Population Stability Index: What is it?
The Population Stability Index (PSI) is a statistical measure of how much a population distribution has changed over time or how different the variable distributions are between two samples.
The PSI was originally designed and used in credit risk scorecards to monitor the changes in the independent variable distributions in time, or between the training data (often referred to as “development sample”) and the live data.