A Unverified Research Problem: Clustering

Clustering is undoubtedly the most unverified research problem. The focus of clustering is on exploration of a structure in a data that is largely unlabelled, even though compiled. The most  simplified and basic definition that one  can give of clustering would be a process by which the objects get organised into groups and the group members have some or the other similarity with each other. To conclude this, it could be said that a cluster is a set of compiled objects that are similar with each other and distinctive from the objects belonging to the other groups.

 To understand this better, we could show it through a graphical representation. Here it goes:

1

In the above illustration, the data has been divided inti 4 clusters that have been formed keeping distance as the parameter. So, this could be called as distance based clustering.Clustering could  be another type, we call it the conceptual clustering. Here the parameters for making clusters are descriptive in nature and different from the simple similarity measures

What do we think is the purpose behind clustering? What goals does it achieve for the researcher? The fundamental goal is to identify the intrinsic grouping in a compiled unlabelled data. This is something we have spoken of above. Even though the main goal remains unchanged, there cannot be categorized a best method of clustering as it is largely subjective to the need and suitability of the study.

Its application is seen apparently in various fields. The main fields of study where we often see cluster analysis getting applied are:

Marketing: In the field of marketing clustering is seen vividly getting used for customer identification on the basis of their different buying patterns behaviours

Biology: There is unlimited variety of plants and animals and their classification in similar groups becomes possible with the application of cluster analysis

Insurance: Identification of frauds on the grounds of high claim cost becomes possible with cluster analysis

This does not limit the use of cluster analysis and other areas such as city planning, Earthquake etc. are where we see a lot of use of cluster analysis

Clustering, like Factor analysis, is in itself an incomplete tool as it does not take care of all the requirements in the specified time period. It also becomes challenging to take care of a large data set. One more drawback or rather a challenge of clustering is  that the interpretation has many way to be done so the standardization of the technique of interpretation is not possible.