Selecting Features with the Population Stability Index

Sole from Train in Data
11 min readJan 6, 2022
Population Stability Index in data science
Population Stability Index in data science

If you worked in credit scoring, you probably heard about the Population Stability Index, or PSI.

The PSI is a metric that quantifies the changes in a variable distribution and it is commonly used to assess the risk of using a variable in a credit risk model.

Unstable features are variables whose distribution changes after certain events, for example due to policy changes or a recession. Unstable features may affect the model performance. Thus, we want to avoid utilizing unstable features in credit risk scorecards.

In this article, we will discuss what the Population Stability Index is, its uses in credit scoring and data science, and how we can select features based on their PSI values with the Python open-source package Feature-engine.

Population Stability Index: What is it?

The Population Stability Index (PSI) is a statistical measure of how much a population distribution has changed over time or how different the variable distributions are between two samples.

The PSI was originally designed and used in credit risk scorecards to monitor the changes in the independent variable distributions in time, or between the training data (often referred to as “development sample”) and the live data.

--

--

Sole from Train in Data
Sole from Train in Data

Written by Sole from Train in Data

Data scientist, book author, online instructor (www.trainindata.com) and Python open-source developer. Get our regular updates: http://eepurl.com/hdzffv

Responses (1)