JUMP TO TOPIC

Cluster|Definition & Meaning

Definition

A cluster can be defined as a group of data (any number, value, or object) gathered around a specific value.

Figure 1: Illustration of Centroid-based clustering

Clusters are formed when certain data is classified into more than one group based on shared properties or characteristics. The data can be any type, such as numbers, values, or objects. In mathematics, a cluster is frequently utilized in data along with classification; this process is known as data clustering. When employing data clustering, an individual takes a collection of numbers that are close together and turns them into a data set.

Data clustering can be illustrated by examining the time in seconds six different athletes, who participated in a 100-meter race, took to complete it. Let us assume the time taken by the athletes was 29, 24, 25, 24, 23, and 17 seconds. In this case, the cluster of data will be formed around 24 seconds.

As mentioned above, a cluster can also be a collection of various objects that belong to the same group, or we can say that the things with similar characteristics are gathered in one cluster, and objects with varying characteristics are gathered in a different cluster. For example, a pencil, a ballpoint, and an ink pen can be grouped in a single cluster.

Common Clustering Methods

Different cluster methodologies are used to form clusters depending on the kind of data available and the type of results required. Below we will discuss some of the common clustering methodologies.

Hierarchical Clustering

Hierarchical Clustering is a methodology of Clustering that includes building clusters that are primarily arranged from top to bottom. The methodology divides things into clusters of related items. Then clusters most similar to each other are merged into one cluster. The result is a collection of groups or clusters in which each group or cluster is unique from the others, and the items inside each group or cluster are generally comparable to one another.

Figure 2: Illustration of Hierarchical clustering

Centroid-based Clustering

Centroid-based Clustering is a kind of Clustering that uses a central element that might or might not be an element of the particular set of data. This method employs the K-Means clustering methodology, in which k is the cluster center, and objects are sorted into clusters according to their proximity to the K value. Figure 1 shows this kind of clustering.

Distribution-based Clustering

Distribution-based Clustering is a clustering approach according to which the data is comprised of distributions, such as Circular distributions. This clustering methodology groups your data in numerous circular distributions as per the desired results. The likelihood that a data point is part of the distribution diminishes as the distance between that data point and the center of the data increases. However, this method should not be employed when you are not sure about the kind of distribution your data have.

Figure 3: Illustration of Distribution based clustering

Density Clustering

In this methodology, high data point density regions are linked together into different groups or clusters. Density-based Clustering permits clusters of any random shape or form so long as the dense regions can be linked. However, this methodology is not compatible with data that has several dimensions and different densities. Additionally, this method inherently does not consider outliers when forming clusters.

Figure 4: Illustration of Density clustering

Applications of Clusters

Clustering has numerous applications in almost every aspect of our life. Both kids and adults can benefit themselves by developing an understanding of clusters and how they can be used when estimating sums of numerous numbers. It is usually known as cluster estimation. A lot of people unknowingly employ cluster estimation almost every day when they shop for things at the store.

Let me explain it to you via an example. Let us say you are at a shop and desire to purchase three items having a price tag of 599, 546, and 631, respectively. Now you want to quickly estimate whether or not they fall under your budget of 2000. Although it’s true that you can use calculators to determine the exact sum of those three items, however, the cluster estimate approach allows one to do it faster and without much effort.

We can see that all three values of 599, 546, and 631 are centered around 600. Now you can cluster estimate the sum of the three items by multiplying 600 by three or adding them, such as 600 + 600 + 600. The cluster estimate thus would be 1800, which falls under the desired budget of 2000.

Clusters also help us in developing an understanding of things around us. Groups or conceptually significant classes of things with similar properties are crucial to how individuals interpret and represent the world. Humans are adept at classifying objects into distinct groups and grouping similar objects together into these groups. For instance, even young kids are able to identify the various classes of items in a picture, whether they are buildings, machines, human beings, flowers, animals, etc.

Clusters are also used to summarize large data. Data can be comprised of numerous classes or sub-classes. Thus, one can develop clusters of those classes or sub-classes and present them as one object. It will significantly shrink your data and will greatly assist in analyzing your data.

An Example of Data Clustering

James went to a gift shop to purchase presents for his friend’s birthday. He selected four presents with price tags of 8.5 dollars, 7.98 dollars, 7.4 dollars, and 8.2 dollars. Now, James wants to cluster estimate the sum of the price tags to find whether they fall under his budget of 35 dollars.

(a) Find the cluster estimate of the price tags.

(b) Also find whether the cluster estimate of presents falls under his budget.

Solution

(a) By looking at the price tags of the presents, it can be observed that all of them are centered around 8 dollars. The cluster estimate can thus be found as shown below.

Cluster estimate = 8 + 8 + 8 + 8

Cluster estimate = 32 dollars

(b) As calculated above, the cluster estimate of the sum of presents is 32 dollars. We know that James’ budget for purchasing presents is 35 dollars. Thus, it can be observed that the cluster estimate of presents falls under his budget.

All images/mathematical drawings were created with GeoGebra.