Pearson correlation coefficient¶
In statistics, the Pearson correlation coefficient ― also known as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ― is the measure of linear correlation between two sets of data.
Definition¶
Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. The form of the definition involves a "product moment", that is, the mean of the product of the mean-adjusted random variables; hence the modifier product-moment in the name.
For a population¶
Pearson's correlation coefficient, when applied to a population, is commonly represented by the Greek letter ρ (rho) and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient. Given a pair of random variables \((X, Y)\), the formula for ρ is:
where:
- cov is the covariance.
- \(\sigma_{X}\) is the standard deviation of \(X\).
- \(\sigma_{Y}\) is the standard deviation of \(Y\).
The formula for ρ can be expressed in terms of mean and expectation. Since
the formula for ρ can also be written as
where:
- \(\sigma_{X}\) and \(\sigma_{Y}\) are defined as above.
- \(μ_{X}\) is the mean of \(X\).
- \(μ_{Y}\) is the mean of \(Y\).
- \(\mathbb{E}\) is the expectation.
The formula for ρ can be expressed in terms of uncentered moments. Since
- \(μ_{X} = \mathbb{E}[X]\)
- \(μ_{Y} = \mathbb{E}[Y]\)
- \(\sigma^{2}_{X} = \mathbb{E}[(X-\mathbb{E}[X])^{2}] = \mathbb{E}[X^{2}] - (\mathbb{E}[X])^{2}\)
- \(\sigma^{2}_{Y} = \mathbb{E}[(Y-\mathbb{E}[Y])^{2}] = \mathbb{E}[Y^{2}] - (\mathbb{E}[Y])^{2}\)
- \(\mathbb{E}[(X-μ_{X})(Y-μ_{Y})] = \mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])] = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]\)
the formula for ρ can also be written as
Pearson's correlation coefficient does not exist when either \(\sigma_{X}\) or \(\sigma_{Y}\) are zero, infinite, or undefined. https://github.com/joshiayush/ai/blob/master/ai/algos/correlation/pearson_correlation/pearson_correlation.py