CHAPTER 3. PROBABILITY AND INFORMATION THEORY
robot discretizes space when predicting the future location of these objects,
then the discretization makes the robot immediately become uncertain about
the precise position of objects: each object could be anywhere within the
discrete cell that it was observed to occupy.
In many cases, it is more practical to use a simple but uncertain rule rather
than a complex but certain one, even if the true rule is deterministic and our
modeling system has the fidelity to accommodate a complex rule. For example, the
simple rule “Most birds fly” is cheap to develop and is broadly useful, while a rule
of the form, “Birds fly, except for very young birds that have not yet learned to
fly, sick or injured birds that have lost the ability to fly, flightless species of birds
including the cassowary, ostrich and kiwi. . . ” is expensive to develop, maintain and
communicate, and after all of this effort is still very brittle and prone to failure.
Given that we need a means of representing and reasoning about uncertainty,
it is not immediately obvious that probability theory can provide all of the tools
we want for artificial intelligence applications. Probability theory was originally
developed to analyze the frequencies of events. It is easy to see how probability
theory can be used to study events like drawing a certain hand of cards in a
game of poker. These kinds of events are often repeatable. When we say that
an outcome has a probability
p
of occurring, it means that if we repeated the
experiment (e.g., draw a hand of cards) infinitely many times, then proportion
p
of the repetitions would result in that outcome. This kind of reasoning does not
seem immediately applicable to propositions that are not repeatable. If a doctor
analyzes a patient and says that the patient has a 40% chance of having the flu,
this means something very different—we can not make infinitely many replicas of
the patient, nor is there any reason to believe that different replicas of the patient
would present with the same symptoms yet have varying underlying conditions. In
the case of the doctor diagnosing the patient, we use probability to represent a
degree of belief, with 1 indicating absolute certainty that the patient has the flu
and 0 indicating absolute certainty that the patient does not have the flu. The
former kind of probability, related directly to the rates at which events occur, is
known as frequentist probability, while the latter, related to qualitative levels of
certainty, is known as Bayesian probability.
If we list several properties that we expect common sense reasoning about
uncertainty to have, then the only way to satisfy those properties is to treat
Bayesian probabilities as behaving exactly the same as frequentist probabilities.
For example, if we want to compute the probability that a player will win a poker
game given that she has a certain set of cards, we use exactly the same formulas
as when we compute the probability that a patient has a disease given that she
55