Clustering in data analysis is a type of unsupervised learning method which identifies similarity between objects. And groups them into clusters. Moreover, classification is a type of supervised learning method which uses predefine classes to assign objects with values.
What is Cluster Analysis?
Cluster analysis is widely used in many applications like business intelligence, image pattern recognition, web search, biology and security.
In business intelligence, clustering is done to organize a large number of customers into similar groups where customers share strong similar characteristics.
Requirements
- Scalability
- Arbitrary shape of clusters
- Ability to deal with different type of attributes
Types of Clustering
- Firstly, we have Partitioning methods
- Secondly, we have Hierarchical methods
- Thirdly, density based methods
- Grid methods
Firstly, in partitioning methods it finds mutually exclusive clusters of spherical shape. This is distance based. Moreover, in this type, we have further classification:
- K-means
- K-mediod
Also, this is effective for small to medium size data sets.
Secondly, in hierarchical methods, clustering is done in hierarchical decomposition i.e. in multiple levels. It allows to approaches:
- Divisive Approach(Top down)
- Agglomerative Approach ( Bottom up)
It may incorporate other techniques like micro clustering.
Thirdly, in density base methods, we can find arbitary shape of clusters and it may filter out outliers.
Lastly, in grid method, we use a multi resolution grid data structure and it has fast processing time.
Types of Classification
- Decision Tree
- K Nearest Neighbors
- Naive Baiyes Classification
- Logistic Regression
Firstly, decision tree forms a chart or a model in a systematic approach.
Secondly, KNN uses the training and testing data set to evaluate the problem and classifies it.
Thirdly, Logistic Regression uses descriptive variables.
Applications of Clustering
- In market research and pattern recognition, we use clustering.
- Moreover, in customer segmentation to find customer buying pattern.
- For spam detection in emails.
- For data analysis
Applications of Classification
- Firstly, In speech recognition
- Secondly, Handwriting recognition
- Thirdly, Biometric identification
- Lastly, Document classification
Classification | Clustering |
Supervised learning method | Unsupervised learning method |
More complex than clustering | Less complex than classification |
For instance, logisitic regression, decision tress, naive bayes etc. | For instance, k means, Hierarchical clustering etc. |
Summary
In conclusion, we have learnt that clustering is a type of unsupervised learning method which identifies similarity between objects. And groups them into clusters. Classification is a type of supervised learning method which uses predefine classes to assign objects with values.
Moreover, we have seen types of clustering techniques which are partitioning methods, hierarchical methods, density and grid based methods. Types of classification techniques are decision tree, naive bayes classification, k nearest neighbors and logistic regression.
We have also learnt about some major applications of classification and clustering.