(Elements of Information Theory, Thomas M. Cover and Joy A. Thomas)
The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a continuous random variable can never be perfect. How well can we do? To frame the question appropriately, it is necessary to define the “goodness” of a representation of a source. This is accomplished by defining a distortion measure which is a measure of distance between the random variable and its representation. The basic problem in rate distortion theory can then be stated as follows: Given a source distribution and a distortion measure, what is the minimum expected distortion achievable at a particular rate? Or, equivalently, what is the minimum rate description required to achieve a particular distortion?
One of the most intriguing aspects of this theory is that joint descriptions are more efficient than individual descriptions. It is simpler to describe an elephant and a chicken with one description than to describe each alone. This is true even for independent random variables. It is simpler to describe X1 and X2 together (at a given distortion for each) than to describe each by itself. Why don’t independent problems have independent solutions? The answer is found in the geometry. Apparently, rectangular grid points (arising from independent descriptions) do not fill up the space efficiently.