A key concept in the field of pattern recognition is that of uncertainty. It arises both through noise on measurements, as well as through the finite size of data sets. Probability theory provides a consistent framework for the quantification and manipulation of uncertainty and forms one of the central foundations for pattern recognition. When combined with decision theory, discussed in Section 1.5, it allows us to make optimal predictions given all the information available to us, even though that information may be incomplete or ambiguous.
在模式识别领域的一个关键的概念是不确定性。它是由两个原因引起的,一是在测量时的噪声,二是有限的数据集。概率理论为我们提供了一个坚实框架,可以定量的描述不确定性,并且构成了模式识别的基石。当我们把概率理论和决策理论合并后,它可以让我们基于可用的信息,对预测进行优化,尽管信息或许是不完整或者不清晰的。
We will introduce the basic concepts of probability theory by considering a simple example. Imagine we have two boxes, one red and one blue, and in the red box we have 2 apples and 6 oranges, and in the blue box we have 3 apples and 1 orange. This is illustrated in Figure 1.9. Now suppose we randomly pick one of the boxes and from that box we randomly select an item of fruit, and having observed which sort of fruit it is we replace it in the box from which it came. We could imagine repeating this process many times. Let us suppose that in so doing we pick the red box 40% of the time and we pick the blue box 60% of the time, and that when we remove an item of fruit from a box we are equally likely to select any of the pieces of fruit in the box.
我们将会使用一个简单的例子来介绍一些基本的概率理论中的概念。想象我们有两个盒子,一个红色,一个蓝色,并且在红色的盒子里有2个苹果、6个桔子;在蓝色的盒子里有3个苹果和1个桔子。如图1.9,现在设想我们随机的选择一个盒子,并从盒子里随机的选择一个水果,然后观察是什么水果,然后返回我们拿水果的那个篮子里。我们可以想象,重复这个过程许多次。假设我们有40%的次数从红盒子里拿,60%的次数从蓝盒子里拿,当我们从盒子里取出一个水果时,我们同样有可能选择盒子里的任何一个水果。
In this example, the identity of the box that will be chosen is a random variable, which we shall denote by B. This random variable can take one of two possible values, namely r (corresponding to the red box) or b (corresponding to the blue box). Similarly, the identity of the fruit is also a random variable and will bedenoted by F. It can take either of the values a (for apple) or o (for orange).
在这个例子里,这个盒子的选择是一个随机变量,记为B。这个随机变量的可能值为,r和b,分别代表红色盒子和蓝色盒子。同样,水果也是一个随机变量,记为F。它可能的值是,a和o。分别代表苹果和桔子。
To begin with, we shall define the probability of an event to be the fraction of times that event occurs out of the total number of trials, in the limit that the total number of trials goes to infinity. Thus the probability of selecting the red box is 4/10 and the probability of selecting the blue box is 6/10. We write these probabilities as p(B = r)=4 /10 and p(B = b)=6 /10. Note that, by definition, probabilities must lie in the interval [0,1]. Also, if the events are mutually exclusive and if they include all possible outcomes (for instance, in this example the box must be either red or blue), then we see that the probabilities for those events must sum to one.
首先,我们会定义概率是一个事件的发生次数在实验总数的占比,在极限中,实验的次数是趋于无限大的。因此选择红盒子的概率为4/10,选择蓝盒子的概率为6/10.我们将这些记为 p(B=r)=4/10 和 p(B=b)=6/10.注意,依据定义,概率一定在[0,1]这个区间范围内。而且,如果事件是相互互斥的,并且如果包含了所有的发生的可能,那么这些事件的概率之和一定等于1.
。。。
网友评论