The legacy of Zellig Harris: Language and information into the 21st century

(John Goldsmith)

Zellig Harris (1909–1992) cast a long shadow across twentieth century linguistics. In mid-century, he was a leading figure in American linguistics, serving as president of the Linguistic Society of America in 1955, just a year before Roman Jakobson. It is fair to say that during that decade—the years just before generative grammar came on the scene—Zellig Harris and Charles Hockett were the two leading figures in the development of American linguistic theory. Today, I daresay Harris is remembered by most linguists as the mentor and advisor to Noam Chomsky at the University of Pennsylvania—and the originator of transformational analysis.


Harris’s work must be situated in terms of the conflict between two visions of linguistic science: the MEDIATIONALIST view, which sees the goal of linguistic research as the discovery of the way in which natural languages link form and meaning, and the DISTRIBUTIONALIST view, which sees the goal as the fully explicit rendering of how the individual pieces of language (phoneme, syllable, morpheme, word, construction, etc.) connect to one another in the ways that define each individual language. The mediationalist view lurks behind most conceptions of language study, formal and nonformal, but it was Harris’s view that each successive improvement in linguistic theory took us a step further AWAY from the mediationalist view, much as advances in biology led scientists to understand that the study of living cells required no new forms of energy, structure, or organization in addition to those which were required to understand nonliving matter. Harris had no use for mediationalist conceptions of linguistics. For linguists in 2005, steeped as we are in an atmosphere of linguistic mediationalism, this makes Harris quite difficult to understand at first.

Harris’s goal was to show that all that was worthwhile in linguistic analysis could best be understood in terms of distribution of components at different hierarchical levels, because he understood—or at least he believed—that there was no other basis on which to establish a coherent and general linguistic theory. His genius lay in the construction of a conception of how such a vision could be put into place concretely.

Harris’s view, from his earliest work through his final statements in the early 1990s, was that the best foundational chances for linguistics were to be found in stablishing
a science of EXTERNAL LINGUISTIC FACTS (such as corpora, though they would typically be augmented by other external facts, like speaker judgments), rather than a science of internalized speaker knowledge. (...)


Harris did not appear to make a great effort to make his conclusions easily accessible to the reader. And yet once his ideas are understood, it is hard to deny that his way of stating them is direct, elegant, and striking. Let us approach the central idea of all of Harris’s work, as summarized by Harris himself in his introductory paper to this volume:

"The structure of language can be found only from the non-equiprobability of combination of parts. This means that the description of a language is the description of contributory departures from equiprobability, and the least statement of such contributions (constraints) that is adequate to describe the sentences and discourses of the language is the most revealing."

Picking this apart into pieces:

1. Linguistic analysis consists of building a representation out of a finite number of
formal objects.

2. The essence of any given language is the restrictions, or constraints, that it places on how the pieces may be put together—these may be phonemes, morphemes, constituents, what have you. If there were no structure, then pieces could be put together any which way; structure MEANS — it is nothing more or less than — restrictions on how pieces can be put together.

3. These restrictions may be absolute ('no pk' clusters are permitted in this language’) or, much more likely, they are statements of distribution, best expressed in the mathematics of probability. A crude reformulation of this would be in the language of markedness, which is arguably an informal way of talking about distributional frequencies. A better way is to use the mathematical vocabulary of distributions, which is to say, probability theory.

4. A formal system can be described formally in a multitude of ways. These are not equivalent: there is a priority among them based on their formal length. In general, one will be significantly shorter than the others, and knowing its length is important.

It is probably impossible to understand the intellectual pull of this research program if one does not appreciate the revolutionary character (and the perceived success) of the phoneme. If the phoneme today seems passe, the discarded error of an earlier generation, then today’s linguist should think of its descendant—for most of us, the
idea of an underlying segment. (...)


It was his view that the important relationship between sounds lay not in their phonetics, but in their DISTRIBUTION (...). What tells us that the flap and the other t’s of English are realizations of a single phoneme /t/ is not the similarity of sound, but the complementarity and predictability of the distribution.

"It is pointless to mix phonetic and distributional contrasts. If phonemes which are phonetically similar are also similar in their distribution, that is a result which must be independently proved. For the crux of the matter is that phonetic and distributional contrasts are methodologically different, and that only distributional contrasts are relevant while phonetic contrasts are irrelevant.

This becomes clear as soon as we consider what is the scientific operation of working out the phonemic pattern. For phonemes are in the first instance determined on the basis of distribution. Two positional variants may be considered one phoneme if they are in complementary distribution; never otherwise. In identical environment (distribution) two sounds are assigned to two phonemes if their difference distinguishes one morpheme from another; in complementary distribution this test cannot be applied. The distributional analysis is simply the unfolding of the criterion used for the original classification. If it yields a patterned arrangement of phonemes, that is an interesting result for linguistic structure."


Rudolf Carnap had (...), in The logical syntax of language (published in 1934 under the title Logische Syntax der Sprache), argued for a coming together of formal syntax and formal logic: by formal, he meant analysis ignoring meaning and considering only categories and combinations of symbols; by syntax, the rules by which items are combined to form expressions (sentences); and by logic, the rules by which valid inferences from one sentence to another can be made. The contrast between syntax and logic was dubbed by Carnap (in English) as the difference between FORMATION rules and TRANSFORMATION rules, an interesting terminological suggestion and one that may have later influenced Harris.

