Part 2. MM4XL Tools > 2. Analytical Tools > Cluster Analysis > An example: clustering company profiles

Cluster Analysis

An example: clustering company profiles

We present here an example of cluster analysis run with MM4XL. This is only an example we use for describing the technique, and we refrain from validating any extrapolation to the real situation in the pharmaceutical, although the data we used were adapted from reliable sources. The goal of the study is the classification of pharmaceutical companies in order to highlight any relationships between company size, geographic area, and market segments where companies do business.

The raw data below are 1999 estimates that describe 21 large pharmaceutical companies in terms of:

  • Sales volume (.000US$)
  • Percentage of sales by company split in geographic regions (US, Europe, Japan, and Other countries)
  • Percentage of sales by company split in market segments (General Practitioners, and Hospitals)

Cluster Analysis Software for Marketing Segmentation

For this example we followed the Punj and Stewart (1983) suggestion of a 2-step clustering approach. They recommend running first a hierarchical methodology, such as Wards method, for determining an initial number of clusters, followed by a partitioning technique, such as K-means, for refining the segmentation.

Step 1: Clustering with Ward's method.

The data above was used as input to the Wards algorithm set at Automatically for the Termination condition. The software produced the two charts and the table shown below. This output offers enough detail for getting a first understanding of the data and the way they cluster in groups.

The Levels histogram helps in identifying the relevant number of clusters. There is no formal rule to interpret this chart. Starting from the bottom, it is typical to take the number of clusters that have a sharper cut from the remainder. In our example the lower three bars show this characteristic, which suggests running a 3-cluster partition.

Cluster Analysis Software for Marketing Segmentation

Dendrograms are tree-like structures used to graphically display when and how the various mergers between pairs of items happened. We found between three and six major partitions, as shown in the dendrogram below.

Cluster Analysis Software for Marketing Segmentation

Step 2: Clustering with K-Means method.

We can now run a K-means partition selecting at least three clusters as target seed, but we can even think of increasing the number of clusters up to six, according to the dendrogram. Before doing so, however, we suggest using Smart Mapping; another tool built in MM4XL, for looking at the partitions on a different picture. The chart below was drawn using the column of data labeled Ordinates in the Wards output together with a second column of progressive values ranging from 1 to the number of clustered items, 21 in our case. Smart Mapping allows placing labels on scatter charts, which is a feature not supported in Excel. The bubble size is proportional to Sales in the picture below. Read the chapter Smart Mapping in this Help file for more information about how to draw Smart Maps.

Cluster Analysis Software for Marketing Segmentation

We opted for a K-means clustering with 3 groups and exact termination condition. The partition is summarized below in table form:

Cluster Analysis Software for Marketing Segmentation

The partition again in the form of an item dispersion chart. The chart shows which item belongs to which cluster. The length of the arm shows how soon or late one item joined its cluster. Finally, the bubble size is proportional to the values in the first column of the input range.

The tool also prints the Between-, Within-, and Total inertia values (see table below). These values are used for comparing the accuracy between partitions.

Cluster Analysis Software for Marketing Segmentation

According to our segmentation procedure there is only one phase left: identify and describe the clusters. Indeed, although MM4XL has partitioned the input data, it is wise to take a look at the quality of the partitions before interpreting the results. Two questions should be answered before describing the partitions:

  1. Is the number of clusters appropriate?
  2. How homogeneous is each cluster?

Question 1 can be answered by simple visual inspection of the dendrogram, as described above. When working with partitioning methods, such as K-means, however, one can employ a more formal rule. Calinski and Harabasz (1974) suggested a method that selects the maximum of C as the appropriate number of clusters, where C is found as follows:

Cluster Analysis Software for Marketing Segmentation

Trace(B) is the Between-group inertia, trace(W)is the Within-group inertia, g is the number of clusters, and n is the number of items. For our example we have C = 5,9, which suggests re-running the analysis and partitioning the data set in 6 clusters rather than 3. It is a matter of choice, but not purely so: clusters too small in size may not be appealing to management, so a higher number of clusters might be less accurate in formal terms, and yet be viable for business.

The second question about homogeneity of clusters can be answered by inspecting the Item Dispersion chart. This chart shows for each cluster found with the K-means method, which items belong to it and when they joined the cluster. The longer one whisker, the later the item joined the cluster, and therefore the less similar that item profile is to other items in the same cluster. According to the chart in our example, cluster 1 seems to be the least homogeneous of the three.

Cluster description

The summary report generated by the K-means method helps in describing the clusters.

Cluster Analysis Software for Marketing Segmentation

The values above are all averages. The rows labeled Group 1 to Group 3 show the average value of each variable in the input data set, respectively. The last row shows average values for each variable computed on the whole data set.

For the sake of the exercise we assigned the following names to the clusters of our example:

Group 3: Pachyderms. These are large companies, covering the whole world, selling through both channels.

Group 2: American practitioners. These are mid-size companies, with business mainly in the US and low penetration in the hospital segment.

Group 1: European explorers. These are companies selling below average, basically to European GPs, with a relevant share of sales coming from the rest of the world.

Back to the analysis goal and according to the analysis results, a relationship between sales level, geographic region, and coverage of market segment can be found in our data set.

Excel can display only a limited number of points on a chart, so our Dispersion charts cannot display more than 255 items on the same chart. Fortunately, the Dendrogram chart does not suffer of this limitation, because when it has more than 255 points it turns to an image.

Lifetime license:

Price: euro

Vote this tool
We proudly serve
Your vote
vote1 vote2 vote3 vote4 vote5