Training & Test Data

Training Data : This is the data set which has the feature variables and target variable. This data is used to train the algorithm to derive the classification/regression equation. Training data is the data which used by the algorithm to learn from.

Test Data : Test data is the data set which is used to validate the trained algorithm. This data set will also have feature and target variables. The trained algorithm will be executed on the records in test data. Now, the actual value/label in target variable and the output value/label from the algorithm can be compared to measure the accuracy of the trained algorithm. Less the difference, more the accuracy!

Notes : Test Data can be a part of training data itself, but will be hidden from the algorithm during training and used fresh to test later. The test records can be randomly selected from the training data or certain set of records can be selected to be the training data. generally, a % (eg : 20%) of randomly selected records from available data is made as test data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s