A specific technique for estimating how accurately a model is capable to predict the dataset.

It assumes that Source Dataset is split into two subsets: Training dataset and Validation dataset.

The Training dataset is used to calibrate the model parameters.

The discrepancy between model values and training dataset values can be low but it does not mean that predicting accuracy of the model on the data outside the Training dataset will be the same low.

This may happen because the model is not unique and a given realization may not be the best across the all data points.

Estimating the accuracy on Validation dataset which was not a part pf a training provides a better assessment of the model predictability.

If the model discrepancy on Validation dataset is close to the model discrepancy on Training dataset one can say that a given model has a good predictability.

Splitting the Source Dataset into Training dataset and Validation dataset can be done in different ways.

It can be done manually or randomly (see Bootstrapping).

Page tree

See also

Page tree

Cross-Validation

See also