@wikipedia


For sample sets of two statistical variables  and :

\rho_P(x,y) = \frac{{\rm cov}(x,y)}{\sigma(x) \sigma(y)} =  \frac{ \sum\limits^n_{i=1} (x_i - \bar x)( y_i - \bar y)}{\sqrt{\sum\limits_i (x_i - \bar x)^2} \cdot \sqrt{\sum\limits_i (y_i - \bar y)^2}}  

where

where

sample mean of variable 

sample mean of variable 

and

finite arrays of -variable and -variable values

covariance between -variable and -variable

,

standard deviation of -variable and -variable


Pearson correlation coefficient ranges between -1 and 1 and indicates how accurately the two variables can be approximated by a linear correlation:

y_i = a \, x_i + b, \quad \forall \, i=1..n

with a certain pick on  and .



Fig. 1. Highly correlated variablesFig. 2. Poorly correlated variablesFig. 3. Highly anti-correlated variables


See also


Natural Science / System / Model / Model Validation

Formal science / Mathematics / Statistics Statistical correlation / Correlation coefficient

Statistical correlation metrics @ review ] [ Spearmen Correlation ] [ Kendall correlation ] [ Fehner correlation ]