(1) | \rho_p(x,y) = \frac{{\rm cov}(x,y)}{\sigma(x) \sigma(y)} |
where
\{ (x, y) \} = \{ (x_1, y_1), \, (x_2, y_2), \, ... (x_n, y_n) \} | discrete array of variables x and y |
{\rm cov}(x,y) | covariance between variables x and y |
\sigma(x), \sigma(y) | standard deviation of property x and y |
Pearson correlation coefficient ranges between -1 and 1 and indicates how close the two properties can be related by a linear correlation:
y_i = a \, x_i + b, \quad \forall \, i=1..n |
with a certain pick on a and b (see Fig. 1 – Fig. 3 for examples)
- Maximum value relates to perfect linear correlation and
a>0
- Zero value relates to random correlation between
x and
y
- Minimum value relates to perfect linear correlation and a<0
Fig. 1. Highly correlated variables | Fig. 2. Poorly correlated variables | Fig. 3. Highly anti-correlated variables |