Supervised learning is the method of using a labelled training data to train the algorithm. Training data will have an input part and its label (the output). The input will mostly be a vector of parameters. Using this, the algorithm will train itself and when a new input is given, it would classify or predict the output label.
The accuracy of algorithm can be determined using a test data set similar to training data. To improve accuracy, training control parameters can be adjusted depending on the algorithm selected to train. Few points to remember while using supervised learning:
- The training data set should not be biased to a particular output label
- Overfitting – This is the issue where algorithm over trains itself and hence output error is more.
- The type of input vectors – numerical, categorical etc.
Few most used supervised learning algorithms are Support Vector Machine, Neural Networks, naive Bayes, Decision trees, K – nearest neighbors, linear regression and logistic regression.
I will write about unsupervised learning in next post.