A real number characterising the real-value model prediction quality (goodness of fit):
R^2 = 1 - \frac{MSD(x, \hat x)}{MSD(x, \bar x)} = 1 - \frac{\sum_i (x_i -\hat x_i)^2}{\sum_i (x_i -\bar x)^2} |
where
x = \{ x_1, \, x_2, \, x_3 , ... x_N \} | observed variable represented by a discrete dataset of numerical samples |
---|---|
\hat x = \{ \hat x_1, \, \hat x_2, \, \hat x_3 , ... \hat x_N \} | predictor of variable x, represented by another discrete dataset of numerical samples, with the same number of samples N predicted at the same conditions as the original samples \{ x_1, \, x_2, \, x_3 , ... x_N \} |
\bar x = \frac{1}{N} \sum_i x_i | mean value of the variable x, which can be considered as some sort of extreme predictor with zero variability |
MSD(x, \hat x) | mean square deviation between a variable x and its predictor \hat x |
MSD(x, \bar x) | mean square deviation between a variable x and its mean value \bar x |
It is similar to Mean Square Deviation (MSD) but quantifies the model prediction efficiency in normalized way which is normally more suitable for assessment goodness of fit.
The coefficient of determination R^2 normally ranges between :
- 0, indicating that prediction error is within the variance of the observed variable around its mean value
and
- 1, indicating a fine fit, fairly reproducing the variability of the x
The R^2 values falling outside the above range indicate a substantial mismatch between variable x and model prediction \hat x and have a meaning that gap between predicted and actual values is higher than the variance of the actual data.
See also
Formal science / Mathematics / Statistics / Statistical Metric