Concepts, Instances and Attributes

September 26, 2014September 28, 2014 / Hari / Leave a comment

Three basic terms to be learnt in machine learning are:

Concept : A concept is what the machine learns in the process. In a classification task, it learns how to classify. This is concept.

Instances : Each row/record in training data set is an instance. It can be collection of 1 or more attributes.

Attributes : As explained above, attributes are each column/field in the data set. These are used by the algorithm to come up with the hypothesis from the data set.

Next post would be about Training and Test data. I would also be writing about topics I learn and exercise I practice in Coursera – Machine Learning course in parallel. 🙂 🙂

Unsupervised Learning

September 22, 2014September 23, 2014 / Hari / Leave a comment

Unsupervised learning is the method of finding hidden pattern or classifications within data on its own. Unlike supervised learning, there are no labels or training data here. The data is clustered into groups by the algorithm using the similarity in data’s features. In most of the cases, we do not know the reason behind formation of clusters unless we analyse the features of data in each cluster.

Commonly used unsupervised algorithms are:

Self Organizing maps
k-means clustering
Hierarchical clustering
Hidden Markov Models
Gaussian mixture models

A good example would be clustering of fans/followers of a Facebook page or Twitter handle. The features would be the profile details of each user and clusters would have similar users grouped together.

Workflow Diagram Reference for my last two posts : machine-learning-who-s-the-boss

In next post, I will discuss about each of the algorithms of supervised and unsupervised categories briefly.

Supervised Learning

September 21, 2014September 23, 2014 / Hari / Leave a comment

Supervised learning is the method of using a labelled training data to train the algorithm. Training data will have an input part and its label (the output). The input will mostly be a vector of parameters. Using this, the algorithm will train itself and when a new input is given, it would classify or predict the output label.

The accuracy of algorithm can be determined using a test data set similar to training data. To improve accuracy, training control parameters can be adjusted depending on the algorithm selected to train. Few points to remember while using supervised learning:

The training data set should not be biased to a particular output label
Overfitting – This is the issue where algorithm over trains itself and hence output error is more.
The type of input vectors – numerical, categorical etc.

Few most used supervised learning algorithms are Support Vector Machine, Neural Networks, naive Bayes, Decision trees, K – nearest neighbors, linear regression and logistic regression.

I will write about unsupervised learning in next post.

Categorical & Numerical Variables

September 18, 2014September 23, 2014 / Hari / Leave a comment

Variables are the basic building blocks of an ML algorithm. Based on these variables, the algorithm identifies and equation which will be applied on new input data. These variables are mostly of two types:

Categorical Variables
This variable represents a field which can be classified into categories or groups.
example : sex, favorite color, age
Numerical Variables
This variable represents a field which can be measured and sorted.
example : height, weight

Categorical variables are visualized using bar charts, frequency tables or pie charts.

http://www.saedsayad.com/categorical_variables.htm

visualizing categorical data

Numerical variables are visualized using scatter plots or line graphs.

http://www.saedsayad.com/numerical_numerical.htm

visualizing numerical data

An interesting reference : Shodor – Numerical and Catagorical data

In my next blog, I will be writing on Supervised Learning.

What is Machine Learning?

September 17, 2014September 22, 2014 / Hari / Leave a comment

Machine Learning(ML) is the process of computer learning from labelled examples. The examples are called Training Data. Based on this training data, computer comes up with rules. These rules are used later to make decisions or predictions for any new data passed into the algorithm.

Basic Machine Learning System Architecture

ML enables computers to teach themselves by identifying patterns and make decision on uncertain data. There are two type of ML methods:

Supervised – Training data provided for the algorithm to learn
Unsupervised – No training data provided

http://nyghtowlblog.files.wordpress.com/2014/04/ml_algorithms.png?w=535&h=311

Classifiaction Of algorithms

I will discuss about these in detail in a forthcoming blog.

ML is used in the field of artificial intelligence to make decision. ML intersects with other fields like mathematics, physics, statistics etc., Certain example of ML applications are:

Face Recognition
Recommendation Systems
Spam Filtering
Character Recognition
Customer Segmentation
Weather Prediction

Based on what is to be achieved through ML, it is divided into two types:

Classification – Categorize object into one of the type/category.
example: If the mail is spam or not.
Regression – Predict a real value.
example: What will be the stock price tomorrow?

I will discuss about these in detail in a forthcoming blog.

Over the years, ML has grown to the level of playing games, composing music and imitating other activities by humans! IBM’s Watson is a good example for this. In next blog, I will discuss about Variables in data, using which the ML trains itself.

Tada – Data

From Digital Analytics to Agentic Orchestration

Author: Hari