Page tree

@wikipedia


For sample sets of two statistical variables  x=\{x_i \, | \, i=1..n \} and  y=\{y_i \, | \, i=1..n \}:

(1) \rho_P(x,y) = \frac{{\rm cov}(x,y)}{\sigma(x) \sigma(y)} = \frac{ \sum\limits^n_{i=1} (x_i - \bar x)( y_i - \bar y)}{\sqrt{\sum\limits_i (x_i - \bar x)^2} \cdot \sqrt{\sum\limits_i (y_i - \bar y)^2}}

where

where

\displaystyle \bar x = \frac{1}{n} \sum\limits_{i=1}^n x_i

sample mean of variable  x

\displaystyle \bar y = \frac{1}{n} \sum\limits_{i=1}^n y_i

sample mean of variable  y

x=\{ x_1, \, x_2, \, ... x_n \} and y=\{ y_1, \, y_2, \, ... y_n \}

finite arrays of x-variable and y-variable values

{\rm cov}(x,y)

covariance between x-variable and y-variable

\sigma(x), \sigma(y)

standard deviation of x-variable and y-variable


Pearson correlation coefficient ranges between -1 and 1 and indicates how accurately the two variables can be approximated by a linear correlation:

y_i = a \, x_i + b, \quad \forall \, i=1..n

with a certain pick on a and b.


  • Maximum value  \rho_p(x,y) = + 1 relates to perfect linear correlation with  a>0 (see also Fig. 1)

  • Zero value  \rho_p(x,y) = 0 relates to absence of correlation between  x and  y  (see also Fig. 2)

  • Minimum value  \rho_p(x,y) = - 1 relates to perfect linear correlation with  a<0 (also called anti-correlation) (see also Fig. 3)


Fig. 1. Highly correlated variablesFig. 2. Poorly correlated variablesFig. 3. Highly anti-correlated variables


See also


Natural Science / System / Model / Model Validation

Formal science / Mathematics / Statistics Statistical correlation / Correlation coefficient

Statistical correlation metrics @ review ] [ Spearmen Correlation ] [ Kendall correlation ] [ Fehner correlation ]



  • No labels