Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Applied Multivariate Analysis: Cluster Analysis.

Rating
-
Sold
-
Pages
11
Uploaded on
01-03-2026
Written in
2025/2026

This document, titled Applied Multivariate Analysis: Cluster Analysis, is a comprehensive technical guide focused on the mathematical and statistical foundations of clustering. It provides a detailed exploration of how to measure similarity and dissimilarity between objects, which is the core requirement for partitioning data into groups.

Show more Read less
Institution
University
Course
STAT

Content preview

Applied Multivariate Analysis: Cluster Analysis



Similarity and Dissimilarity Measures


1 Introduction
In Cluster Analysis, the objective is to partition a set of n objects into groups such that objects within a group
are more similar to each other than to objects in other groups. This requires a formal mathematical definition of
distance (dissimilarity) and proximity (similarity).


2 The Data Matrix
Let X be an n × p data matrix:  
x11 x12 . . . x1p
 x21 x22 . . . x2p 
X= .
 
.. .. .. 
 .. . . . 
xn1 xn2 . . . xnp
where xi = (xi1 , xi2 , . . . , xip )⊤ is the p-dimensional vector representing the i-th observation.


3 Dissimilarity Measures for Continuous Variables
3.1 Minkowski Distance (Lr Norm)
The Minkowski distance between observations i and j is defined as:
p
!1/r
X
r
dr (i, j) = |xik − xjk |
k=1

Special cases include:

• Euclidean Distance (r = 2): The most common metric.
v
u p q
uX
d2 (i, j) = t (xik − xjk )2 = (xi − xj )⊤ (xi − xj )
k=1


• Manhattan/City Block Distance (r = 1): Robust to outliers.
p
X
d1 (i, j) = |xik − xjk |
k=1


• Chebyshev Distance (r → ∞):
d∞ (i, j) = max |xik − xjk |
k




1

, 3.2 Mahalanobis Distance
To account for correlations and differing variances between variables, we use the Mahalanobis distance. Let S be
the sample covariance matrix: q
dM (i, j) = (xi − xj )⊤ S−1 (xi − xj )

Derivation: If x is transformed by z = S−1/2 x, then the Euclidean distance in z-space is identical to the
Mahalanobis distance in x-space.


4 Similarity Measures for Binary Data
For binary variables (0 or 1), similarity is based on a 2 × 2 contingency table for objects i and j:

Object i \ j 1 (Present) 0 (Absent) Total
1 (Present) a b a+b
0 (Absent) c d c+d
Total a+c b+d p

4.1 Standard Coefficients
1. Simple Matching Coefficient (SMC):

a+d
SSM C =
a+b+c+d

2. Jaccard Coefficient (SJ ): Used when “double-zeros” (d) carry no information.
a
SJ =
a+b+c

5 Similarity via Correlation
The Pearson Correlation Coefficient measures similarity in the “shape” of profiles:
Pp
(xik − x̄i )(xjk − x̄j )
ρij = P k=1
q
p 2
Pp 2
k=1 (xik − x̄i ) k=1 (xjk − x̄j )

where x̄i is the mean of the p variables for object i.


Problem 1: Invariance of Mahalanobis Distance
Task: Prove that the Mahalanobis distance between two vectors xi and xj is invariant under any non-singular
linear transformation y = Ax + b.

Proof. Let Sx be the covariance matrix of the original data X. The squared Mahalanobis distance is defined as:

d2M (xi , xj ) = (xi − xj )⊤ S−1
x (xi − xj )

Consider the linear transformation yi = Axi + b, where A is a non-singular p × p matrix. The difference vector
in the transformed space is:
yi − yj = (Axi + b) − (Axj + b) = A(xi − xj )
The covariance matrix of the transformed variables Y is given by:

Sy = Var(Ax + b) = ASx A⊤



2

Document information

Uploaded on
March 1, 2026
Number of pages
11
Written in
2025/2026
Type
Class notes
Professor(s)
Xyz
Contains
Ug semester 6

Subjects

$10.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
statsspecialist2026

Get to know the seller

Seller avatar
statsspecialist2026 XYZ
View profile
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
2 months
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions