Class notes

Applied Multivariate Analysis: Cluster Analysis.

Rating

Sold

Pages

Uploaded on

01-03-2026

Written in

2025/2026

This document, titled Applied Multivariate Analysis: Cluster Analysis, is a comprehensive technical guide focused on the mathematical and statistical foundations of clustering. It provides a detailed exploration of how to measure similarity and dissimilarity between objects, which is the core requirement for partitioning data into groups.

Show more Read less

Institution

Course

Content preview

Applied Multivariate Analysis: Cluster Analysis

Similarity and Dissimilarity Measures

1 Introduction
In Cluster Analysis, the objective is to partition a set of n objects into groups such that objects within a group
are more similar to each other than to objects in other groups. This requires a formal mathematical definition of
distance (dissimilarity) and proximity (similarity).

2 The Data Matrix
Let X be an n × p data matrix:  
x11 x12 . . . x1p
 x21 x22 . . . x2p 
X= .
 
.. .. .. 
 .. . . . 
xn1 xn2 . . . xnp
where xi = (xi1 , xi2 , . . . , xip )⊤ is the p-dimensional vector representing the i-th observation.

3 Dissimilarity Measures for Continuous Variables
3.1 Minkowski Distance (Lr Norm)
The Minkowski distance between observations i and j is defined as:
p
!1/r
X
r
dr (i, j) = |xik − xjk |
k=1

Special cases include:

• Euclidean Distance (r = 2): The most common metric.
v
u p q
uX
d2 (i, j) = t (xik − xjk )2 = (xi − xj )⊤ (xi − xj )
k=1

• Manhattan/City Block Distance (r = 1): Robust to outliers.
p
X
d1 (i, j) = |xik − xjk |
k=1

• Chebyshev Distance (r → ∞):
d∞ (i, j) = max |xik − xjk |
k

1

, 3.2 Mahalanobis Distance
To account for correlations and differing variances between variables, we use the Mahalanobis distance. Let S be
the sample covariance matrix: q
dM (i, j) = (xi − xj )⊤ S−1 (xi − xj )

Derivation: If x is transformed by z = S−1/2 x, then the Euclidean distance in z-space is identical to the
Mahalanobis distance in x-space.

4 Similarity Measures for Binary Data
For binary variables (0 or 1), similarity is based on a 2 × 2 contingency table for objects i and j:

Object i \ j 1 (Present) 0 (Absent) Total
1 (Present) a b a+b
0 (Absent) c d c+d
Total a+c b+d p

4.1 Standard Coefficients
1. Simple Matching Coefficient (SMC):

a+d
SSM C =
a+b+c+d

2. Jaccard Coefficient (SJ ): Used when “double-zeros” (d) carry no information.
a
SJ =
a+b+c

5 Similarity via Correlation
The Pearson Correlation Coefficient measures similarity in the “shape” of profiles:
Pp
(xik − x̄i )(xjk − x̄j )
ρij = P k=1
q
p 2
Pp 2
k=1 (xik − x̄i ) k=1 (xjk − x̄j )

where x̄i is the mean of the p variables for object i.

Problem 1: Invariance of Mahalanobis Distance
Task: Prove that the Mahalanobis distance between two vectors xi and xj is invariant under any non-singular
linear transformation y = Ax + b.

Proof. Let Sx be the covariance matrix of the original data X. The squared Mahalanobis distance is defined as:

d2M (xi , xj ) = (xi − xj )⊤ S−1
x (xi − xj )

Consider the linear transformation yi = Axi + b, where A is a non-singular p × p matrix. The difference vector
in the transformed space is:
yi − yj = (Axi + b) − (Axj + b) = A(xi − xj )
The covariance matrix of the transformed variables Y is given by:

Sy = Var(Ax + b) = ASx A⊤

2

Report Copyright Violation

Written for

Institution: University
Course: STAT

All documents for this subject (1)

Document information

Uploaded on: March 1, 2026
Number of pages: 11
Written in: 2025/2026
Type: Class notes
Professor(s): Xyz
Contains: Ug semester 6

Subjects

cluster analysis
similarity measures
dissimilarity measures
data matrix
chebyshev distance ma
applied multivariate analysis
minkowski distance euclidean distance
manhattan distance city block distance

$10.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

statsspecialist2026

Get to know the seller

statsspecialist2026 XYZ

View profile

Sold

Member since

2 months

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller statsspecialist2026. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $10.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47251 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Applied Multivariate Analysis: Cluster Analysis.

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?