A real number characterising the real-value model prediction quality (goodness of fit):

$\begin{array}{l}\displaystyle R^2 = 1 - \frac{MSD(x, \hat x)}{MSD(x, \bar x)} = 1 - \frac{\sum_i (x_i -\hat x_i)^2}{\sum_i (x_i -\bar x)^2}\end{array}$

where

$\begin{array}{l}x = \{ x_1, \, x_2, \, x_3 , ... x_N \}\end{array}$	observed variable represented by a discrete dataset of numerical samples
$\begin{array}{l}\hat x = \{ \hat x_1, \, \hat x_2, \, \hat x_3 , ... \hat x_N \}\end{array}$	predictor of variable $\begin{array}{l}x\end{array}$ , represented by another discrete dataset of numerical samples, with the same number of samples $\begin{array}{l}N\end{array}$ predicted at the same conditions as the original samples $\begin{array}{l}\{ x_1, \, x_2, \, x_3 , ... x_N \}\end{array}$
$\begin{array}{l}\bar x = \frac{1}{N} \sum_i x_i\end{array}$	mean value of the variable $\begin{array}{l}x\end{array}$ , which can be considered as some sort of extreme predictor with zero variability
$\begin{array}{l}MSD(x, \hat x)\end{array}$	mean square deviation between a variable $\begin{array}{l}x\end{array}$ and its predictor $\begin{array}{l}\hat x\end{array}$
$\begin{array}{l}MSD(x, \bar x)\end{array}$	mean square deviation between a variable $\begin{array}{l}x\end{array}$ and its mean value $\begin{array}{l}\bar x\end{array}$

It is similar to Mean Square Deviation (MSD) but quantifies the model prediction efficiency in normalized way which is normally more suitable for assessment goodness of fit.

The coefficient of determination $\begin{array}{l}R^2\end{array}$ normally ranges between :

0, indicating that prediction error is within the variance of the observed variable around its mean value

and

1, indicating a fine fit, fairly reproducing the variability of the $\begin{array}{l}x\end{array}$

The $\begin{array}{l}R^2\end{array}$ values falling outside the above range indicate a substantial mismatch between variable $\begin{array}{l}x\end{array}$ and model prediction $\begin{array}{l}\hat x\end{array}$ and have a meaning that gap between predicted and actual values is higher than the variance of the actual data.

Page tree

See also

Page tree

Coefficient of determination = R2

See also