Summary

Summary Classification: k Nearest Neighbours

Rating

Sold

Pages

Uploaded on

03-01-2022

Written in

2005/2006

Recall that in classification problems, the task is to decide to which of a predefined, finite set of categories an object belongs. In this lecture, we look at another classification method. The method we look at is known as the k-nearest-neighbours or kNN method

Show more Read less

Institution

Course

Content preview

Classification: k Nearest Neighbours 2 Distance

2.1 Local distance functions, global distance functions and weights
1 Introduction
A global distance function, dist, can be defined by combining in some way a number of local distance functions,
Recall that in classification problems, the task is to decide to which of a predefined, finite set of categories an object distA , one per attribute.
belongs. The easiest way of combining them is to sum them:
In this lecture, we look at another classification method. The method we look at is known as the k-nearest-neighbours
dist(x, q) =def Σni=1 distAi (x.Ai , q.Ai )
or kN N method.
Just as with the probabilistic methods that we looked at in the previous two lectures, we need a dataset of examples. More generally, the global distance can be defined as a weighted sum of the local distances. The weights wi allow
Each example describes an instance and gives the class to which it belongs. As before, we’ll assume instances are different attributes to have different importances in the computation of the overall distance. Weights sometimes lie
described by a set of attribute-value pairs, and there is a finite set of class labels L. So the dataset comprises examples between zero and one; a weight of zero would indicate a totally irrelevant attribute.
of the form h{A1 = a1 , A2 = a2 , . . . , An = an }, class = cli. For a particular instance x, we will refer to x’s value
for attribute Ai as x.Ai . dist(x, q) =def Σni=1 wi × distAi (x.Ai , q.Ai )
In the probabilistic methods that we looked at, the learning step involved computing probabilities from the dataset.
Once this was done, in principle the dataset could be thrown away; classification was done using just the probabilities. A weighted average is also common:
In k-nearest-neighbours, the learning step is trivial: we simply store the dataset in the system’s memory. Yes, that’s it! Σni=1 wi × distAi (x.Ai , q.Ai )
dist(x, q) =def
In the classification step, we are given an instance q (the query), whose attributes we will refer to as q.Ai and we wish Σni=1 wi
to know its class. In kN N , the class of q is found as follows:
The weights can be supplied by the system designer. There are also ways of learning weights from a dataset.
1. Find the k instances in the dataset that are closest to q. What we haven’t discussed is how to define the local distance functions. We will exemplify the different definitions
by computing the distance between query q and x1 and x2 below:
2. These k instances then vote to determine the class of q.
x1 x2 q
Ties (whether they arise when finding the closest instances to q or when voting for the class of q) are broken arbitrarily.
sex female sex male sex male
In the following visualisation of this method, we assume there are only two attributes A1 and A2 , and two different weight 60 weight 75 weight 70
classes (where circles with solid fill represent instances in the dataset of one class and circles with hashed fill represent amount 4 amount 2 amount 1
instances in the dataset of the other class). Query q is here being classified by its 3 nearest neighbours: meal full meal full meal snack
duration 90 duration 60 duration 30
Attribute A1 class over class under class ?

111
000 The attributes and their values are: sex (male/female), weight (between 50 and 150 inclusive), amount of alcohol
000
111 11
00
000
111 00
11
consumed in units (1-16 inc.), last meal consumed today (none, snack or full meal), and duration of drinking session
000
111 00
11 (20-320 minutes inc.). The classes are: over or under the drink driving limit.
00
11
00
11
11
00
00
11 2.2 Hamming distance
00
11
q 11
00 k=3
The easiest local distance function, known as the overlap function, returns 0 if the two values are equal and 1 otherwise:
11
00 11
00
00
11

0 if x.A = q.A
00
11 00
11 distA (x.A, q.A) =def
00
11 00
11 1 otherwise.
00
11 00
11
00
11
Attribute A2 If the global distance function is defined as the sum of the local distances, then we’re simply counting the number
of attributes on which the two instances disagree. This is called Hamming distance. Weighted sums and weighted
averages are also possible.
All that remains to do is discuss how distance is measured, and how the voting works.
Class Exercise. What is the Hamming distance between x1 and q and between x2 and q?

1 2

Report Copyright Violation

Written for

Institution: Bamberg University
Course: Machine Leaning

All documents for this subject (17)

Document information

Uploaded on: January 3, 2022
Number of pages: 3
Written in: 2005/2006
Type: SUMMARY

Subjects

$7.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

riyadhalgburi

Get to know the seller

riyadhalgburi Southwest Jiaotong University

View profile

Sold

Member since

4 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller riyadhalgburi. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 48201 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Classification: k Nearest Neighbours

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?