Unsupervised Learning

Unsupervised learning is the method of finding hidden pattern or classifications within data on its own. Unlike supervised learning, there are no labels or training data here. The data is clustered into groups by the algorithm using the similarity in data’s features. In most of the cases, we do not know the reason behind formation of clusters unless we analyse the features of data in each cluster.

http://practiceovertheory.com/blog/2010/02/15/machine-learning-who-s-the-boss/

Commonly used unsupervised algorithms are:

  1. Self Organizing maps
  2. k-means clustering
  3. Hierarchical clustering
  4. Hidden Markov Models
  5. Gaussian mixture models

A good example would be clustering of fans/followers of a Facebook page or Twitter handle. The features would be the profile details of each user and clusters would have similar users grouped together.

Workflow Diagram Reference for my last two posts : machine-learning-who-s-the-boss

In next post, I will discuss about each of the algorithms of supervised and unsupervised categories briefly.

Supervised Learning

Supervised learning is the method of using a labelled training data to train the algorithm. Training data will have an input part and its label (the output). The input will mostly be a vector of parameters. Using this, the algorithm will train itself and when a new input is given, it would classify or predict the output label.

The accuracy of algorithm can be determined using a test data set similar to training data. To improve accuracy, training control parameters can be adjusted depending on the algorithm selected to train. Few points to remember while using supervised learning:

  1. The training data set should not be biased to a particular output label
  2. Overfitting – This is the issue where algorithm over trains itself and hence output error is more.
  3. The type of input vectors – numerical, categorical etc.

http://practiceovertheory.com/blog/2010/02/15/machine-learning-who-s-the-boss/

Few most used supervised learning algorithms are Support Vector Machine, Neural Networks, naive Bayes, Decision trees, K – nearest neighbors, linear regression and logistic regression.

I will write about unsupervised learning in next post.

Categorical & Numerical Variables

Variables are the basic building blocks of an ML algorithm. Based on these variables, the algorithm identifies and equation which will be applied on new input data. These variables are mostly of two types:

  1. Categorical Variables
    This variable represents a field which can be classified into categories or groups.
    example : sex, favorite color, age
  2. Numerical Variables
    This variable represents a field which can be measured and sorted.
    example : height, weight

Categorical variables are visualized using bar charts, frequency tables or pie charts.

http://www.saedsayad.com/categorical_variables.htm

visualizing categorical data

Numerical variables are visualized using scatter plots or line graphs.

http://www.saedsayad.com/numerical_numerical.htm

visualizing numerical data

An interesting reference : Shodor – Numerical and Catagorical data

In my next blog, I will be writing on Supervised Learning.

What is Machine Learning?

Machine Learning(ML) is the process of computer learning from labelled examples. The examples are called Training Data. Based on this training data, computer comes up with rules. These rules are used later to make decisions or predictions for any new data passed into the algorithm.

ML Architecture

Basic Machine Learning System Architecture

ML enables computers to teach themselves by identifying patterns and make decision on uncertain data. There are two type of ML methods:
  1. Supervised – Training data provided for the algorithm to learn
  2. Unsupervised – No training data provided
http://nyghtowlblog.files.wordpress.com/2014/04/ml_algorithms.png?w=535&h=311

Classifiaction Of algorithms

I will discuss about these in detail in a forthcoming blog.
ML is used in the field of artificial intelligence to make decision. ML intersects with other fields like mathematics, physics, statistics etc., Certain example of ML applications are:
  1. Face Recognition
  2. Recommendation Systems
  3. Spam Filtering
  4. Character Recognition
  5. Customer Segmentation
  6. Weather Prediction
Based on what is to be achieved through ML, it is divided into two types:
  1. Classification – Categorize object into one of the type/category.
    example: If the mail is spam or not.
  2. Regression – Predict a real value.
    example: What will be the stock price tomorrow?
I will discuss about these in detail in a forthcoming blog.
Over the years, ML has grown to the level of playing games, composing music and imitating other activities by humans! IBM’s Watson is a good example for this. In next blog, I will discuss about Variables in data, using which the ML trains itself.