Classification and Clustering in Data Analysis

Clustering in data analysis is a type of unsupervised learning method which identifies similarity between objects. And groups them into clusters. Moreover, classification is a type of supervised learning method which uses predefine classes to assign objects with values.

What is Cluster Analysis?

Cluster analysis is widely used in many applications like business intelligence, image pattern recognition, web search, biology and security.

In business intelligence, clustering is done to organize a large number of customers into similar groups where customers share strong similar characteristics.

Requirements

Scalability
Arbitrary shape of clusters
Ability to deal with different type of attributes

Types of Clustering

Firstly, we have Partitioning methods
Secondly, we have Hierarchical methods
Thirdly, density based methods
Grid methods

Firstly, in partitioning methods it finds mutually exclusive clusters of spherical shape. This is distance based. Moreover, in this type, we have further classification:

K-means
K-mediod

Also, this is effective for small to medium size data sets.

Secondly, in hierarchical methods, clustering is done in hierarchical decomposition i.e. in multiple levels. It allows to approaches:

Divisive Approach(Top down)
Agglomerative Approach ( Bottom up)

It may incorporate other techniques like micro clustering.

Thirdly, in density base methods, we can find arbitary shape of clusters and it may filter out outliers.

Lastly, in grid method, we use a multi resolution grid data structure and it has fast processing time.

Types of Classification

Decision Tree
K Nearest Neighbors
Naive Baiyes Classification
Logistic Regression

Firstly, decision tree forms a chart or a model in a systematic approach.

Secondly, KNN uses the training and testing data set to evaluate the problem and classifies it.

Thirdly, Logistic Regression uses descriptive variables.

Applications of Clustering

In market research and pattern recognition, we use clustering.
Moreover, in customer segmentation to find customer buying pattern.
For spam detection in emails.
For data analysis

Applications of Classification

Firstly, In speech recognition
Secondly, Handwriting recognition
Thirdly, Biometric identification
Lastly, Document classification

Classification	Clustering
Supervised learning method	Unsupervised learning method
More complex than clustering	Less complex than classification
For instance, logisitic regression, decision tress, naive bayes etc.	For instance, k means, Hierarchical clustering etc.

Summary

In conclusion, we have learnt that clustering is a type of unsupervised learning method which identifies similarity between objects. And groups them into clusters. Classification is a type of supervised learning method which uses predefine classes to assign objects with values.

Moreover, we have seen types of clustering techniques which are partitioning methods, hierarchical methods, density and grid based methods. Types of classification techniques are decision tree, naive bayes classification, k nearest neighbors and logistic regression.

We have also learnt about some major applications of classification and clustering.