data mining

Classification and Clustering in Data Analysis

Clustering in data analysis is a type of unsupervised learning method which identifies similarity between objects. And groups them into clusters. Moreover, classification is a type of supervised learning method which uses predefine classes to assign objects with values.

data analysis

What is Cluster Analysis?

Cluster analysis is widely used in many applications like business intelligence, image pattern recognition, web search, biology and security.

In business intelligence, clustering is done to organize a large number of customers into similar groups where customers share strong similar characteristics.

Requirements

  • Scalability
  • Arbitrary shape of clusters
  • Ability to deal with different type of attributes

Types of Clustering

  • Firstly, we have Partitioning methods
  • Secondly, we have Hierarchical methods
  • Thirdly, density based methods
  • Grid methods

Firstly, in partitioning methods it finds mutually exclusive clusters of spherical shape. This is distance based. Moreover, in this type, we have further classification:

  • K-means
  • K-mediod

Also, this is effective for small to medium size data sets.

Secondly, in hierarchical methods, clustering is done in hierarchical decomposition i.e. in multiple levels. It allows to approaches:

  • Divisive Approach(Top down)
  • Agglomerative Approach ( Bottom up)

It may incorporate other techniques like micro clustering.

Thirdly, in density base methods, we can find arbitary shape of clusters and it may filter out outliers.

Lastly, in grid method, we use a multi resolution grid data structure and it has fast processing time.

Types of Classification

Firstly, decision tree forms a chart or a model in a systematic approach.

Secondly, KNN uses the training and testing data set to evaluate the problem and classifies it.

Thirdly, Logistic Regression uses descriptive variables.

Applications of Clustering

  • In market research and pattern recognition, we use clustering.
  • Moreover, in customer segmentation to find customer buying pattern.
  • For spam detection in emails.
  • For data analysis

Applications of Classification

  • Firstly, In speech recognition
  • Secondly, Handwriting recognition
  • Thirdly, Biometric identification
  • Lastly, Document classification
ClassificationClustering
Supervised learning methodUnsupervised learning method
More complex than clusteringLess complex than classification
For instance, logisitic regression, decision tress, naive bayes etc.For instance, k means, Hierarchical clustering etc.

Summary

In conclusion, we have learnt that clustering is a type of unsupervised learning method which identifies similarity between objects. And groups them into clusters. Classification is a type of supervised learning method which uses predefine classes to assign objects with values.

Moreover, we have seen types of clustering techniques which are partitioning methods, hierarchical methods, density and grid based methods. Types of classification techniques are decision tree, naive bayes classification, k nearest neighbors and logistic regression.

We have also learnt about some major applications of classification and clustering.

About the author

Drishti Patel

View all posts
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments