STA422
Prove the equivalence between the Bayes rule and the minimum expected loss rule for
classification under the assumption of known class-conditional densities.
Bayesian decision theory is a fundamental statistical approach to the problem of pattern
classification. This approach is based on quantifying the trade-offs between various
classification decisions using probability and the costs that accompany such decisions.
(Zanibbi, 2017).
Using the sea bass/salmon example
In the classic Bayesian Probability example of predicting whether the next fish caught will be
a sea bass or a salmon, the state of nature and prior represent different aspects of the
information we have about the situation. The state of nature is a random variable. This refers
to the true but unknown, category of the next fish. It can be either: w1 = sea bass or w1 =
salmon. At any given time, the next fish is either a sea bass or a salmon, although we may
not know which. The catch of salmon and sea bass is equiprobable meaning, P(w1 ) = P(w2 )
= (uniform priors) = 0.5 (50% chance of each). However, in practise, there might be some
prior knowledge based on factors like: fishing location, season, or previous catches, of which
will influence the prior probabilities accordingly.
w= w1 for see bass and w = w2 for salmon, Where: P(w1 ) a priori probability that the next
fish is sea bass and P(w1 ) + P(w2 ) = 1 (exclusivity and exhaustivity).
Decision rule with only the prior information: decide (w1 ) if P(w1 ) > P(w1 ) otherwise
decide (w1 ). In most circumstances we are not asked to make decisions with so little
information, we might for instance use a lightness measurement x to improve our classifier.
Use of the class – conditional information: The probability density function p(x|(w1 )
should be written as pX (x|w1 ) to indicate that we are speaking about a particular density
function for the random variable X. Then, p(x|(w1 ) and p(x|(w2 ) describe the difference in
lightness between populations of sea and salmon.
, On the above figure, hypothetical class-conditional probability density functions show the
probability density of measuring a particular feature value x given the pattern is in category
w1 . If x represents the length of a fish, the two curves might describe the difference in length
of populations of two types of fish. Density functions are normalized, and thus the area under
each curve is 1.
Posterior, likelihood, evidence: suppose that we know both the prior probabilities p(wj ) and
the conditional densities p(x|wj ). Suppose further that we measure the lightness of a fish and
discover that its value is x. So, this measurement influence our attitude concerning the true
state of nature, that is, the category of the fish: the (joint) probability density of finding a
pattern that is in category (wj ) and has feature value x can be written two ways: p(wj , x) =
p(x|wj )p(wj ) likelihood ×prior
p(wj |x)p(x) = p(x|wj )P(wj ). = = . Where in case of 2 categories:
p(x) evidence
p(x) = ∑2j=1 p(x|wj )P(wj ). In Bayes’ formula the product of the likelihood and the prior
probability that is most important in determining the posterior probability; the evidence
factor, p(x), can be viewed as merely a scale factor that guarantees that the posterior
probabilities sum to one. If we have an observation x for which P(w1 |x), is greater than.
P(w2 |x), we would naturally be inclined to decide that the true state of nature is w1 .
Decision given the posterior probabilities: x is an observation for which: If P(w1 |x) >
P(w2 |x) , True state of nature = w1 and if P(w1 |x) < P(w2 |x) , True state of nature = w2 .
Therefore: whenever we observe a particular x, the probability of error is: P(error | x) =
P(w1 |x) if we decide w2 and P(error | x) = P(w2 | x) if we decide w1.
Prove the equivalence between the Bayes rule and the minimum expected loss rule for
classification under the assumption of known class-conditional densities.
Bayesian decision theory is a fundamental statistical approach to the problem of pattern
classification. This approach is based on quantifying the trade-offs between various
classification decisions using probability and the costs that accompany such decisions.
(Zanibbi, 2017).
Using the sea bass/salmon example
In the classic Bayesian Probability example of predicting whether the next fish caught will be
a sea bass or a salmon, the state of nature and prior represent different aspects of the
information we have about the situation. The state of nature is a random variable. This refers
to the true but unknown, category of the next fish. It can be either: w1 = sea bass or w1 =
salmon. At any given time, the next fish is either a sea bass or a salmon, although we may
not know which. The catch of salmon and sea bass is equiprobable meaning, P(w1 ) = P(w2 )
= (uniform priors) = 0.5 (50% chance of each). However, in practise, there might be some
prior knowledge based on factors like: fishing location, season, or previous catches, of which
will influence the prior probabilities accordingly.
w= w1 for see bass and w = w2 for salmon, Where: P(w1 ) a priori probability that the next
fish is sea bass and P(w1 ) + P(w2 ) = 1 (exclusivity and exhaustivity).
Decision rule with only the prior information: decide (w1 ) if P(w1 ) > P(w1 ) otherwise
decide (w1 ). In most circumstances we are not asked to make decisions with so little
information, we might for instance use a lightness measurement x to improve our classifier.
Use of the class – conditional information: The probability density function p(x|(w1 )
should be written as pX (x|w1 ) to indicate that we are speaking about a particular density
function for the random variable X. Then, p(x|(w1 ) and p(x|(w2 ) describe the difference in
lightness between populations of sea and salmon.
, On the above figure, hypothetical class-conditional probability density functions show the
probability density of measuring a particular feature value x given the pattern is in category
w1 . If x represents the length of a fish, the two curves might describe the difference in length
of populations of two types of fish. Density functions are normalized, and thus the area under
each curve is 1.
Posterior, likelihood, evidence: suppose that we know both the prior probabilities p(wj ) and
the conditional densities p(x|wj ). Suppose further that we measure the lightness of a fish and
discover that its value is x. So, this measurement influence our attitude concerning the true
state of nature, that is, the category of the fish: the (joint) probability density of finding a
pattern that is in category (wj ) and has feature value x can be written two ways: p(wj , x) =
p(x|wj )p(wj ) likelihood ×prior
p(wj |x)p(x) = p(x|wj )P(wj ). = = . Where in case of 2 categories:
p(x) evidence
p(x) = ∑2j=1 p(x|wj )P(wj ). In Bayes’ formula the product of the likelihood and the prior
probability that is most important in determining the posterior probability; the evidence
factor, p(x), can be viewed as merely a scale factor that guarantees that the posterior
probabilities sum to one. If we have an observation x for which P(w1 |x), is greater than.
P(w2 |x), we would naturally be inclined to decide that the true state of nature is w1 .
Decision given the posterior probabilities: x is an observation for which: If P(w1 |x) >
P(w2 |x) , True state of nature = w1 and if P(w1 |x) < P(w2 |x) , True state of nature = w2 .
Therefore: whenever we observe a particular x, the probability of error is: P(error | x) =
P(w1 |x) if we decide w2 and P(error | x) = P(w2 | x) if we decide w1.