Mutual information with Python

Sole from Train in Data
10 min readAug 12, 2022
mutual information, feature selection, python
Mutual information for feature selection

Mutual information (MI) is a non-negative value that measures the mutual dependence between two random variables. The mutual information measures the amount of information we can know from one variable by observing the values of the second variable.

The mutual information is a good alternative to Pearson’s correlation coefficient, because it is able to measure any type of relationship between variables, not just linear associations. And also, it is suitable for both continuous and discrete variables, unlike Pearson’s correlation coefficient.

MI is closely related to the concept of entropy. Thus, I will first introduce the entropy, then show how we compute the entropy of a discrete variable. Next, I will show how to compute the MI between discrete variables. I will extend the definition of MI for continuous variables. And finally, I will finish with a Python implementation of feature selection based on MI.

In summary, in the following paragraphs we will discuss:

  • Entropy.
  • Related entropy.
  • Mutual information of discrete variables.
  • Mutual information of continuous variables.
  • Feature selection based on MI with Python.



Sole from Train in Data

Data scientist, book author, online instructor ( and Python open-source developer. Get our regular updates: