mytest > help > Part 2. MM4XL Tools > 2. Analytical Tools > Cluster Analysis > What is segmentation?

Cluster Analysis

What is segmentation?

Although already long known, the concept of market segmentation was first treated formally by Wendel Smith (1956). According to him, the basic proposition of segmentation is that markets are made of segments relatively homogeneous in terms of needs, wants, and offer.

Segmentation approaches

Myers distinguishes between two basic conceptual frameworks for segmentation:

  • Customer-based versus Product/Service-based segmentation
  • A priori versus Post hoc (a posteriori) segmentation

The former group belongs to dependence techniques, which use one or more independent variables to explain and predict a dependent variable. Among the most common dependence techniques are AID, CHAID, regression, and discriminant analysis.

The latter group belongs to independence techniques, which are typically used for grouping people or items that are found to be similar in terms of one or more describing variables. Among the most common dependence techniques are hierarchical clustering, partition clustering, and other multivariate analysis methods such as factor analysis, correspondence analysis (see also Brand Mapping in this help file), and principal components analysis.

Segmentation procedure

Most segmentation studies follow a general procedure consisting of at least the following steps:

  • Select the segmentation variables
  • Select and run the segmentation methodology
  • Identify and describe segments
1. Select the segmentation variables

The selection of the appropriate kind and number of variables to be clustered depends on the ultimate study goal. In most cases, however, two important issues arise:

  • How to handle variables measured on different scales?
  • How many variable to include in the model?

The available literature suggests handling the first issue by means of standardization. Cluster Analysis performs Complete standardization, as opposed to Centering standardization, and it can be done selecting the checkbox Standardization in the user dialog. See also the chapter Technicalities for more details on standardizing variables.

The number of variables to include in the model can only be determined by the analyst with a trial and error approach. In general, we suggest beginning with selecting a data set considered relevant for the cluster analysis and running a Correlation analysis for highlighting relationships between variables. Excel computes correlation in two ways: either with built in functions (such as =CORREL() ) or using the add-in Data Analysis in the Tools menu (if you do not see the option Data Analysis you have to install it. Press F1 and in the search panel type the string without quotes "add-in programs included with". Then look around for the installation commands). We used the latter option for making the table below, and the raw data we used can be found in chapter An example: clustering company profiles.

Cluster Analysis Software for Marketing Segmentation

The relationship here is measured by means of the Pearsons Correlation Coefficient (for more information about correlation coefficients press F1 in Excel and then type Correlation). Variables that do not seem to be correlated to other variables in the data set can be thought of having less differentiating power between clusters, so they might be removed from the analysis. Punj and Stewart (1983) warn:

A variable that is not related to the final clustering solution, i.e., does not differentiate among clusters in some manner, causes a serious deterioration of the clustering method.

In order to be sure to have removed variables without relevant contribution to the analysis, we suggest running the cluster analysis first removing all apparently non-relevant variables. Then, run it again including back the first of the removed variables. Then, run it again adding the second removed variable, and so on for all non-relevant variables. If all partitions look the same one may safely believe the removed variables do not differentiate among clusters. The matrix above suggests that Sales levels are positively correlated to the US and negatively to Europe. This means large companies have a large share of revenue in the US than in Europe. The same relation is found with GPs and Hospitals. Large companies make larger sales with hospitals than small companies do. Finally, companies with strong penetration in Europe tend to be strong in Japan and Other markets. To the contrary, companies with solid sales in the US tend to do little business abroad. All variables we used appear to be relevant to the analysis.

2. Select and run the segmentation methodology

There is a wide range of methodologies available, and it is not always true that the most sophisticated ones yield better results than more nave methods. Sometimes the conjoint application of more than one method is recommended and it is for this reason also that MM4XL makes available one of each of the most appreciated techniques, K-means and Wards method, respectively.

Moreover, we also recommend Brand Mapping as a tool for segmentation. Read the chapter Brand Mapping in this help file for more details.

3. Identify and describe segments

Identifying and describing segments from a segmentation study is more of an art than a scientific practice. The experience, expertise, and intuition of the analysts play a primary role in the selection phase, however there are tools that can help. We show later how to take full advantage of the visual inspection tools produced by Cluster Analysis.

The description phase is usually done with the support of contingency tables that describe the cluster configuration in terms of the variables used for the analysis. As a descriptive aid the K-means prints a Dispersion chart and a Summary table. The table summarizes minimum, average, median, and maximum value of each variable for each cluster. It is very useful for grasping quickly the main profile of each group. The chart shows which item merged with its cluster and when it happened. The Wards method, on the other hand, prints a dendrogram as a descriptive, visual aid.

Lifetime license:

Price: euro

Vote this tool
We proudly serve
Your vote
vote1 vote2 vote3 vote4 vote5
Email:
Gender:
M
F
Age:
Position:
Department:
Comment: