Mathematical characteristics of the entropy function

We will denote by an ensemble X the collection of events (represented by

a random variable) x occurring with probability p(x):

X ≡ {x, p(x)}. (11.4)

Definition 11.1. The entropy function for an ensemble X is given by

H(X) = −k

X

x

p(x) log p(x). (11.5)

The number k is a constant which depends on the units in which H is mea-

sured.

The function H(X) satisfies the following properties.

1. H(X) is always positive, and is continuous as a function of p(x) that is

symmetric under exchange of any two events x

i

and x

j

.

1

See footnote 1 on page 111.

Characterization of Quantum Information 217

2. It has a minimum value of 0, when only one event occurs with probability

1 and all the rest have probability 0. This is obvious to see since H(p)

is a positive function and its minimum has to be zero.

3. It has a maximum value of k log n when each x occurs with equal prob-

ability 1/n. Here is a simple proof of this fact:

H(X) − k log n = k

X

x

p(x) log

1

p(x)

− k

X

x

p(x) log n

= k

X

p(x) log

1

np(x)

≤ k

X

p(x)

1

np(x)

− 1

.

This is because log x ≤ x − 1, with equality only if x = 1 (see Figure 11.2),

an important result often used in information theoretic proofs.

FIGURE 11.2: Graph of y = x − 1 compared with y = ln x.

So we have

H(X) − k log n ≤ k

X

1

n

X

p(x)

= 0,

∴ H(X) ≤ k log n. (11.6)

Box 11.1: Binary Entropy

A very useful concept is the entropy function of a probability distribution

218 Introduction to Quantum Physics and Information Processing

of a binary random variable, such as the result of the toss of a coin, not

necessarily unbiased. Here, one value occurs with probability p and the other

with 1 − p. We then have

H

bin

(p) = −p log p − (1 − p) log(1 − p). (11.7)

In this simple case all the listed properties of the mathematical entropy func-

tion are obvious.

1. Positive: H

bin

(p) > 0 always;

2. Symmetric: H

bin

(p) = H

bin

(1 − p);

3. H

bin

(p) has a maximum of 1 when p = 1/2 as in a fair coin;

4. H

bin

(p) has a minimum of 0 when p = 1 as in a two-headed coin.

FIGURE 11.3: The binary entropy function.

This function is a useful tool in deriving properties of entropy, especially when

different probability distributions are mixed together. An important property

of the entropy function is made evident in this simple case: that of concavity.

This property is used very often in concluding various results in classical as

well as quantum information theory. The graph in Figure 11.3 shows that the

function is literally concave. A mathematical statement of this property is

that the function lies above any line cutting the graph. Algebraically, for two

points x

1

, x

2

< 1, we have

H

bin

px

1

+ (1 − p)x

2

≥ pH

bin

(x

1

) + (1 − p)H

bin

(x

2

). (11.8)

Characterization of Quantum Information 219

11.1.3 Relations between entropies of two sets of events

From the way it is defined, Shannon entropy is closely related to probability

theory. In this book, we do not expect a thorough background in probability

theory, so I will simply draw your attention to some important results, so

that you may be piqued enough to look them up on your own. Consider two

ensembles X = {x, p(x)} and Y = {y, p(y)}. We will define various measures

to compare the probability distributions {p(x)} and {p(y)}.

1. Relative entropy of X and Y measures the difference between the two

probability distributions {p(x)} and {p(y)}:

H(X k Y ) = −

X

x,y

p(x) log p(y) − H(X)

=

X

x,y

p(x) log

p(y)

p(x)

. (11.9)

Here again we use the convention that

−0 log 0 ≡ 0, −p(x) log 0 ≡ ∞, p(x) > 0. (11.10)

An important property of the relative entropy is that it is positive. The

relative entropy is also called the Kullback–Leibler distance. However,

it is not symmetric, and so is not a true distance measure, but it gives

us, for example, the error in assuming that a certain random variable

has probability distribution {p(y)} when the true distribution is {p(x)}.

Thus this definition is more useful when we have a set of events X with

two different probability distributions {p(x)} and {q(x)},

H(p k q) =

X

x

p(x) log

p(x)

q(x)

. (11.11)

2. Joint entropy of X and Y measures the combined information pre-

sented by both distributions. Classically, the joint probability of X and

Y , denoted by {p(x, y)}, is defined over a set X ⊗Y . The joint entropy

is then

H(X, Y ) = −

X

x,y

p(x, y) log p(x, y). (11.12)

If X and Y are independent events, then

H(X, Y ) = H(X) + H(Y ). (11.13)

3. Conditional entropy measures the information gained by the occur-

rence of X if Y has already occurred and we know the outcome. The

220 Introduction to Quantum Physics and Information Processing

FIGURE 11.4: Relationship between entropic quantities.

classical conditional probability of an event x given y is defined as

p(x|y) = p(x, y)/p(y), and we have

H(X|Y ) = −

X

x,y

p(x|y) log p(x|y) (11.14)

= H(X, Y ) − H(Y ). (11.15)

The second equation is an important relation: a chain rule for entropies:

H(X, Y ) = H(X) + H(Y |X). (11.16)

4. Mutual information measures the correlation between the distribu-

tions of X and Y . This is the difference between the information gained

by the occurrence of X, and the information gained by occurrence of X

if Y has already occurred. The mutual information is symmetric, so we

have

I(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X) (11.17)

= H(X) + H(Y ) − H(X, Y ). (11.18)

The mutual information is a measure of how much the uncertainty about

X is reduced by a knowledge of Y . You can also see that it is the relative

entropy of the joint distribution p(x, y) and the product distribution

p(x)p(y):

I(X; Y ) =

X

x,y

p(x, y) log

p(x, y)

p(x)p(y)

. (11.19)

One way to picture the interrelationships between these entropic quantities

is the Venn diagram of Figure 11.4. Given these definitions, the Shannon

entropies satisfy the following properties that are easily proved


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *