## quarta-feira, 17 de março de 2010

### Quantitative Expression for lnformation

(Transmission of Information, R. V. L. Hartley)

At each selection there are available three possible symbols. Two successive selections make possible $3^2$, or 9, different permutations or symbol sequences. Similarly n selections make possible $3^n$ different sequences. Suppose that instead of this system, in which three current values are used, one is provided in which any arbitrary number $s$ of different current values can be applied to the line and distinguished from each other at the receiving end. Then the number of symbols available at each selection is $s$ and the number of distinguishable sequences is $s^n$.

Consider the case of a printing telegraph system of the Baudot type, in which the operator selects letters or other characters each of which when transmitted consists of a sequence of symbols (usually five in number). We may think of the various current values as primary symbols and the various sequences of these which represent characters as secondary symbols. The selection may then be made at the sending end among either primary or secondary symbols. Let the operator select a sequence of $n_2$ characters each made up of a sequence of $n_1$, primary selections. At each selection he will have available as many different secondary symbols as there are different sequences that can result from making $n_1$ selections from among the $s$ primary symbols. If we call this number of secondary symbols $s_2$, then

$s_2 = s^{n_1}$

For the Baudot System

$s_2 = 2^5 = 32 \textmf{characters}$

The number of possible sequences of secondary symbols that can result from $n_2$ secondary selections is

$s_2^{n_2} = s^{n_1 n_2}$

Now $n_1 n_2$ is the number $n$ of selections of primary symbols that would have been necessary to produce the same sequence had there been no mechanism for grouping the primary symbols into secondary symbols. Thus we see that the total number of possible sequences is $s^n$ regardless of whether or not the primary symbols are grouped
for purposes of interpretation.

This number $s^n$ is then the number of possible sequences which we set out to find in the hope that it could be used as a measure of the information involved. Let us see how well it meets the requirements of such a measure.

For a particular system and mode of operation $s$ may be assumed to be fixed and the number of selections $n$ increases as the communication proceeds. Hence with this measure the amount of information transmitted would increase exponentially with the number of selections and the contribution of a single selection to the total information transmitted would progressively increase. Doubtless some such increase does often occur in communication as viewed from the psychological standpoint. For example, the single word "yes" or "no", when coming at the end of a protracted discussion, may have an extraordinarily great significance. However, such cases are the exception rather than the rule. The constant changing of the subject of discussion, and even of the individuals involved, has the effect in practice of confining the cumulative action of this exponential relation to comparatively short periods.

Moreover we are setting up a measure which is to be independent of psychological factors. When we consider a physical transmission system we find no such exponential increase in the facilities necessary for transmitting the results of successive selections. The various primary symbols involved are just as distinguishable at the receiving end for one primary selection at for another. A telegraph system finds one ten-word message no more difficult to transmit than the one which preceded it. A telephone system which transmits speech successfully now will continue to do so as long as the system remains unchanged. In order then for a measure of information to be of practical engineering value it should be of such a nature that the information is proportional to the number of selections. The number of possible sequences is therefore not suitable for use directly as a measure of information.

We may, however, use it as the basis for a derived measure which does meet the practical requirements. To do this we arbitrarily put the amount of information proportional to the number of selections and so choose the factor of proportionality as to make equal amounts of information correspond to equal numbers of possible sequences.
For a particular system let the amount of information associated with $n$ selections be

$H = K n$

where $K$ is a constant which depends on the number $s$ of symbols available at each selection. Take any two systems for which $s$ has the values $s_1$ and $s_2$ and let the corresponding constants be $K_1$ and $K_2$. We then define these constants by the condition that whenever the numbers of selections $n_1$ and $n_2$, for the two systems are such that the number of possible sequences is the same for both systems, than the
amount of information is also the same for both; that is to say, when

$s_1^{n_1} = s_2^{n_2}$

$H = K_1 n_1 = K_2 n_2$

from which

$\frac{K_1}{log s_1} = \frac{K_2}{log s_2}$

This relation will hold for all values of $s$ only if $K$ is connected with $s$
by the relation

$K = K_0 log s$

where $K_0$ is the same for all systems. Since $K_0$ is arbitrary, we may omit it if we make the logarithmic base arbitrary. The particular base selected fixes the size of the unit of information. Putting this value of $K$ in (4),

$H = n log s$
$H = log s^n$

What we have done then is to take as our practical measure of information the logarithm of the number of possible symbol sequences.

The situation is similar to that involved in measuring the transmission loss due to the insertion of a piece of apparatus in a telephone system. The effect of the insertion is to alter in a certain ratio the power delivered to the receiver. This ratio might be taken as a measure of the loss. It is found more convenient, however, to take the logarithm of the power ratio as a measure of the transmission loss.

If we put $n$ equal to unity, we see that the information associated with a single selection is the logarithm of the number of symbols available; for example, in the Baudot System referred to above, the number $s$ of primary symbols or current values is 2 and the information content of one selection is $log 2$; that of a character which involves 5 selections is $5 log 2$. The same result is obtained if we regard a character as a secondary symbol and take the logarithm of the number of these symbols, that is, $log 2^5$, or $5 log 2$. The information associated with 100 characters will be $500 log 2$. The numerical value of the information will depend upon the system of logarithms used. Increasing the number of current values from 2 to say 10, that is, in the ratio 5, would increase the information content of a given number of selections in the ratio $\frac{log 10}{log 2}$, or 3.3. Its effect on the rate of transmission will depend upon how the rate of making selections is affected. This will be discussed later.

When, as in the case just considered, the secondary symbols all involve the same number of primary selections, the relations are quite simple. When a telegraph system is used which employs a non-uniform code they are rather more complicated. A difficulty, more apparent than real, arises from the fact that a given number of secondary or character selections may necessitate widely different numbers of primary selections, depending on the particular characters chosen. This would seem to indicate that the values of information deduced from the primary and secondary symbols would be different. It may easily be shown, however, that this does not necessarily follow.

If the sender is at all times free to choose any secondary symbol, he may make all of his selections from among those containing the greatest number of primary symbols. The secondary symbols will then all be of equal length, and, just as for the uniform code, the number of primary symbols will be the product of the number of characters by the maximum number of primary selections per character. If the number of primary selections for a given number of characters is to be kept to some smaller value than this, some restriction must be placed on the freedom of selection of the secondary symbols. Such a restriction is imposed when, in computing the average number of dots per character for a non-uniform code, we take account of the average frequency of occurrence of the various characters in telegraph messages. If this allotted number of dots per character is not to be exceeded in sending a message, the operator must, on the average, refrain from selecting the longer characters more often than their average rate of occurrence. In the language of the present discussion we would say that for certain of the $n_1$ secondary selections the value of $s_2$, the number of secondary symbols, is so reduced that a summation of the information content over all the characters of primary selections involved. This may be written

$\sum_1^{n_2} log s_2 = n log s$

where $n$ is the total number of primary symbols or dot lengths assigned to $n_2$ characters. This suggests that the primary symbols furnish the most convenient basis for evaluating information.

The discussion so far has dealt largely with telegraphy. When we attempt to extend this idea to other forms of communication certain generalizations need to be made. In speech, for example, we might assume the primary selections to represent the choice of successive words. On than basis $s$ would represent the number of available words. For the first word of a conversation this would correspond to the number of words in the language. For subsequent selections the number would ordinarily be reduced because subsequent words would have to combine in intelligible fashion with those preceding. Such limitations, however, are limitations of interpretation only and the system would be just as capable of transmitting a communication in which all possible permutations of the words of the language were intelligible. Moreover, a telephone system may be just as capable of transmitting speech in one language as in another. Each word may be spoken in a variety of ways and sung in a still greater variety. This very large amount of information associated with the selection of a single spoken word suggests that the word may better be regarded as a secondary symbol, or sequence of primary symbols. Let us see where this point of view leads us.

The actual physical embodiment of the word consists of an acoustic or electrical disturbance which may be expressed as a magnitude-time function as in Fig. 2, which shows an oscillographic record of a speech sound. Such functions are also typical of other modes of communication, as will be discussed in more detail later. We have then to examine the ability of such a continuous function to convey information. Obviously over any given time interval the magnitude may vary in accordance with an infinite number of such functions. This would mean an infinite number of possible secondary symbols, and hence an infinite amount of information. In practice, however, the information contained is finite for the reason that the sender is unable to control the form of the function with complete accuracy, and any distortion of its form tends to cause it to be confused with some other function.

A continuous curve may be thought of as the limit approached by a curve made up of successive steps, as shown in Fig. 3, when the interval between the step is made infinitesimal. An imperfectly defined curve may then be thought of as one in which the interval between the steps is finite. The steps then represent primary selections. The number of selections in a finite time is finite. Also the change made at each step is to be thought of as limited to one of a finite number of values. This means that the number of available symbols is kept finite. If this were not the case, the curve would be defined with complete exactness at each of the steps, which would mean that an observation made at any one step would offer the possibility of distinguishing among an infinite number of possible values. The following illustration may serve to bring out the relation between the discrete selections and the corresponding continuous
curve. We may think of a bicycle equipped with a peculiar type of steering device which permits the rider to set the front wheel in only a limited number of fixed positions. On such a machine he attempts to ride in such a manner that the front wheel shall follow an irregularly curved line. The accuracy with which he is able to accomplish this will depend upon how far he goes between adjustments of the steering
mechanism and upon the number of positions in which he is able to set it.

By this more or lee artificial device the continuous magnitude-time function as used in telephony is made subject to the same type of treatment as the succession of discrete selections involved in telegraphy.