Measures of Information

We would like to develop a measure for the rather abstract concept of

information, so that eﬃciencies of diﬀerent protocols or physical systems of

communication can be compared, and more eﬃcient systems designed.

We need to ﬁrst have a model for the process we are describing, and Shan-

non’s proposed model has stuck (Figure 11.1). We start with a source of

information, which, like a talking person or a buzzing telephone receiver, gen-

erates messages using some predeﬁned language consisting of symbols we

will call the alphabet. The alphabet could, for example, be the set of English

211

212 Introduction to Quantum Physics and Information Processing

FIGURE 11.1: Simpliﬁed model for communication

letters for communicating through a written message, or a set of “dit”s and

“da”s for a message in Morse code, or 0’s and 1’s for a computerized message.

Any message emitted by the source is then encoded for transmission using

the physical system involved. For a talking person, the message is encoded in

the vibration of the air molecules. For a telephonic message, the encoding is

in terms of analog electric pulses in the wire. For the now-obsolete telegraph,

encoding was done in binary dit’s and da’s represented as short or long electric

pulses which we have now reﬁned into the 0s and 1s of the modern binary

computer. The encoded message is then transmitted across a channel. This is

obvious as the air carrying the spoken message, the wire carrying the telegraph

signal, or the optical cable transmitting long distance digital messages. In the

context of quantum information, the channel may well just be the environment

that the quantum system ﬁnds itself in between computational steps.

The importance of the channel in information theory lies in how much

it costs in terms of its usage, and how it may distort or reduce the quality

of the message being sent. We would need to quantify the compression of

the message in order to more eﬃciently utilize the resources, as well as the

capacity of the channel to carry information in the presence of noise. Noise

could be literal in the case of a talking person, random electrical signals in

a telephone or telegraph wire or the computer circuitry, or the change in the

state of a quantum system interacting inadvertently with its environment. At

the other end of the communication model is the decoder and ﬁnally the

receiver of the message, which is either the ear of the audience, the ear-piece

of the telephone, the printed output of the telegraph, or the monitor of the

computer.

We will now see how we can develop a measure for the information carried

by a message, or a symbol in the message. The appearance of a symbol at the

receiver’s end is an event, whose probability can be predicted if we have some

idea of the properties of the source. We will talk in the language of events and

their probabilities of occurrence.

Characterization of Quantum Information 213

11.1.1 Classical picture: Shannon entropy

According to Shannon, information associated with events is related to

their probability of occurrence. Let’s take a simple example. Suppose a magi-

cian has hidden a ball in one of three boxes labeled 1, 2, and 3. Now he asks

you to choose the box in which is ball is hidden. How would you choose? It de-

pends on the information you have about the ball’s location. In the beginning,

you do not know which box it is in, so you think it is equally probable to be

in any of the three boxes. Thus each box carries 1/3 of the information about

which box the ball is in. You’d alternatively say that the ball is in each box

with equal probability. The situation is thus described by an initial probability

distribution P

(ball is in 1) = 1/3, P

(ball is in 2) = 1/3, P

(ball is in 3) = 1/3.

An event such as opening any one box now will give you further information

on where the ball is among the three. Suppose you open one box, say box 2, and

the ball is not in it. The probability distribution has now changed: conditioned

by the event of having opened box 2:

(ball is in 1) = 1/2, P

(ball is in 2) = 0, P

(ball is in 3) = 1/2.

Opening a box now gives information only about where the ball is in the two.

On the other hand, what about the information with the magician for the

same situation? He knows that he has put the ball in box1, so his distribution

(ball is in 1) = 1, P

(ball is in 2) = 0, P

(ball is in 3) = 0.

In this case opening a box adds no information at all to the magician’s knowl-

edge!

This example teaches us a few things:

The information carried by an event is related inversely to its probability of

occurrence before it has occurred. (After the event, the probability is of course

1!) The more probable the occurrence, the less information the event carries.

The occurrence of an event removes doubts about the possibilities before it

occurs. The information it carries is thus the doubt it removes by occurring.

The information carried by an event is changed if a previous event carries

related information. If an event has happened, it carries no new information

as compared to an event that has not yet happened.

Let’s now try to quantify the information I carried by an event E. If we

are talking about messages encoded in symbols sent across a channel, then

the event is the reading of the symbol by the receiver. The occurrence of a

particular symbol x is the event E(x) with probability p(x),

p(x) = 1. Now

the mathematical formulation of information is concerned with the syntactic

form of the message rather than the semantic. This means that we are not

going to worry about the meaning conveyed by a message in a literal sense

214 Introduction to Quantum Physics and Information Processing

(which would depend on how the read symbol is translated by the receiver’s

brain), but rather by the form of the message itself, in terms of the symbols

it carries. In other words, information carried by a symbol does not depend

on which symbol it is, but only on our ignorance, or uncertainty, about its

occurrence, before it is read. Thus the information carried by the symbol x

on the occurrence of event E(x), must depend inversely on p(x).

There are some properties we expect the information function to have.

Suppose two events E

and E

both occur. The information carried by this

joint occurrence is I(E

+ E

). If the result of one event E

is revealed, then

the information carried by the other event must be the diﬀerence: I(E

) =

I(E

)−I(E

). Thus, the information carried by the joint event is the sum

of the information carried by each. So if many events occur sequentially, then

the information also builds up in the same order.

Remember that if an event is certain to occur, then it carries no informa-

tion. A certainty is represented by probability 1, so that I(1) = 0. (That’s

like a computer that is switched oﬀ, so it reveals no information!)

Collecting all these properties together, we require our mathematical in-

formation I(E

) to be a function that satisﬁes

1. inverse relation to probability: I(E

) ∼

2. monotonously increasing, continuous function of p

3. additivity: I(E

+ E

) = I(E

) + I(E

4. identity corresponding to information of certainty, I(1) = 0.

One function that satisﬁes all these properties is the logarithm. This led Shan-

non to deﬁne the self-information of an event E

I(E

) =

log p

= −log p

. (11.1)

Here, the base of the logarithm denotes the units in which information is

measured. If we use the natural logarithm then the unit of information is

nats. If all events are binary in nature (yes-no answers), then the logarithm is

taken to the base 2 and the units of information is bits. Note that the diﬀerence

between diﬀerent units for information is a multiplicative constant since

log

x = log

a log

Example 11.1.1. Let’s calculate the information in bits carried by the drawing

of a card out of a playing deck of 64 cards. To rephrase the problem, suppose a

magician asks you to pull out a card at random, and then tries to guess which

card you drew. The logical way for the magician to remove his ignorance

Characterization of Quantum Information 215

about the card would be to ask you questions about the card. (Of course

he cannot ask you which card it is!) Since we want the answer in bits, the

answers to these questions must be binary: “yes” or “no.” The problem then

translates to how many such binary-answer questions the magician must ask

for correctly guessing the number. The procedure he adopts is the “binary

search” algorithm of dividing the range of possible answers into two at each

step and asking if the card is in one of the two ranges. Your yes or no will allow

him to select one of the sections and further divide it. Consider the following

sample scenario:

Q 1. Is it between 1 and 32? Ans: No.

Q 2. … between 33 and 49? Ans: Yes.

Q 3. … 33 and 41? Ans: No.

Q n.

What is the total number n of such questions? It’s the number of times the

range [1 − 64] can be bifurcated, which, if you follow through the above se-

quence of questions, will be 6.

n = 6 = log

64 = −log

and 1/64 is the probability of choosing one particular card out of the deck.

Clearly this answer is independent of which card it is (the semantic meaning

of the event).

A message is the occurrence of a string of events: the appearance of each

symbol constituting the message. Thus the total information carried by a

message is the weighted average of all the symbols in the message. This is

given by

H(m) =

I(E

) = −

log p

. (11.2)

What if a particular symbol doesn’t occur in the message? In that case too,

p = 0. Even though log p is then undeﬁned, the symbol cannot contribute

to the information carried by the message, and for our purposes, we deﬁne

0 log 0 ≡ 0.

From its similarity to the thermodynamic quantity of the same name, the

information function H(m) has been called the entropy of the message. In sta-

tistical thermodynamics, we seek to relate the macroscopic properties system

to the microstates of its constituents. Notably, the energy of a gas is related

to the momenta of its constituent molecules. If the number of microstates

Measures of Information

Comments

Leave a Reply Cancel reply