2.1 Local distance functions, global distance functions and weights
1 Introduction
A global distance function, dist, can be defined by combining in some way a number of local distance functions,
Recall that in classification problems, the task is to decide to which of a predefined, finite set of categories an object distA , one per attribute.
belongs. The easiest way of combining them is to sum them:
In this lecture, we look at another classification method. The method we look at is known as the k-nearest-neighbours
dist(x, q) =def Σni=1 distAi (x.Ai , q.Ai )
or kN N method.
Just as with the probabilistic methods that we looked at in the previous two lectures, we need a dataset of examples. More generally, the global distance can be defined as a weighted sum of the local distances. The weights wi allow
Each example describes an instance and gives the class to which it belongs. As before, we’ll assume instances are different attributes to have different importances in the computation of the overall distance. Weights sometimes lie
described by a set of attribute-value pairs, and there is a finite set of class labels L. So the dataset comprises examples between zero and one; a weight of zero would indicate a totally irrelevant attribute.
of the form h{A1 = a1 , A2 = a2 , . . . , An = an }, class = cli. For a particular instance x, we will refer to x’s value
for attribute Ai as x.Ai . dist(x, q) =def Σni=1 wi × distAi (x.Ai , q.Ai )
In the probabilistic methods that we looked at, the learning step involved computing probabilities from the dataset.
Once this was done, in principle the dataset could be thrown away; classification was done using just the probabilities. A weighted average is also common:
In k-nearest-neighbours, the learning step is trivial: we simply store the dataset in the system’s memory. Yes, that’s it! Σni=1 wi × distAi (x.Ai , q.Ai )
dist(x, q) =def
In the classification step, we are given an instance q (the query), whose attributes we will refer to as q.Ai and we wish Σni=1 wi
to know its class. In kN N , the class of q is found as follows:
The weights can be supplied by the system designer. There are also ways of learning weights from a dataset.
1. Find the k instances in the dataset that are closest to q. What we haven’t discussed is how to define the local distance functions. We will exemplify the different definitions
by computing the distance between query q and x1 and x2 below:
2. These k instances then vote to determine the class of q.
x1 x2 q
Ties (whether they arise when finding the closest instances to q or when voting for the class of q) are broken arbitrarily.
sex female sex male sex male
In the following visualisation of this method, we assume there are only two attributes A1 and A2 , and two different weight 60 weight 75 weight 70
classes (where circles with solid fill represent instances in the dataset of one class and circles with hashed fill represent amount 4 amount 2 amount 1
instances in the dataset of the other class). Query q is here being classified by its 3 nearest neighbours: meal full meal full meal snack
duration 90 duration 60 duration 30
Attribute A1 class over class under class ?
111
000 The attributes and their values are: sex (male/female), weight (between 50 and 150 inclusive), amount of alcohol
000
111 11
00
000
111 00
11
consumed in units (1-16 inc.), last meal consumed today (none, snack or full meal), and duration of drinking session
000
111 00
11 (20-320 minutes inc.). The classes are: over or under the drink driving limit.
00
11
00
11
11
00
00
11 2.2 Hamming distance
00
11
q 11
00 k=3
The easiest local distance function, known as the overlap function, returns 0 if the two values are equal and 1 otherwise:
11
00 11
00
00
11
0 if x.A = q.A
00
11 00
11 distA (x.A, q.A) =def
00
11 00
11 1 otherwise.
00
11 00
11
00
11
Attribute A2 If the global distance function is defined as the sum of the local distances, then we’re simply counting the number
of attributes on which the two instances disagree. This is called Hamming distance. Weighted sums and weighted
averages are also possible.
All that remains to do is discuss how distance is measured, and how the voting works.
Class Exercise. What is the Hamming distance between x1 and q and between x2 and q?
1 2