Skip to main content
assistive.skiplink.to.breadcrumbs
assistive.skiplink.to.header.menu
assistive.skiplink.to.action.menu
assistive.skiplink.to.quick.search

Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

If model discrepancy on Validation dataset is not close to model discrepancy on Training dataset then this phenomenon is called overtraining and means that a given model realization has "remembered" the Training dataset but can not accurately predict on the data points outside the Training dataset.

Splitting the Source Dataset into Training dataset and Validation dataset can be done in different ways.

It can be done manually or randomly (see Bootstrapping).

It should be noted though that Source Dataset may not hold enough of representative events/occurrences to provide the opportunity for Cross-Validation and in this case the Goodness of fit over the Training dataset (which is the whole Source Dataset in this case) will be the only one available, thus increasing the risk of future Model Prediction.

See also

...

Natural Science / System / Model / Model Validation

...

[ Source Dataset ] [ Training dataset ] [ Validation dataset ]

[ Cross-Validation Plot ]