(Zellig Harris, Language and Information)
1.1 Problems and Methods
To Consider the structure of language, especially its syntax -- that is, how sentences are built from words -- we note first the usual approaches. Scientists coming to the problem from the outside often seek regularities in the sequential relation among the words in a sentence, since the data presents itself sequentially. However, sufficient regularities have not been found, for reasons that will appear later. In contrast, people who work with language analyze it o n the basis of what is called grammatical relations, such as the subject and object of a verb, or the relation of an affix to its host word. Here too there are difficulties. Few if any grammatical relations appear in the same way in all languages, so the individual relations cannot be taken as primitives for language as such. The situation is rather that in each language there are some relations that can be called grammatical, but a satisfactory general definition is lacking. Furthermore, grammatical relations are unique to natural language, and if we can describe language only in such terms we will be unable to compare language with anything else in the world, not even with such close relatives as gesture on the one hand and mathematics on the other. Finally, the elements on which grammatical relations hold are not adequately defined. The one type of element that is precisely established is the set of phonemes, the characteristic sounds of a language ; indeed the discovery of phonemes is the beginning of a precise science of language. But as to words, if they are thought of as correlations of sound sequences with meanings, we are left with many problems, such as homonyms (as in see, sea, and the Holy See), or with the two pronunciations of economics, not to mention many exceptional situations. And as to sentences, the lack of a general definition is well known. Hence, while traditional grammars can provide adequate descriptions of a language, they do not supply a framework for considering the structure of language in general.
1.2 Procedures Yielding the Elements
To see how the description of language structure is achieved, we note first how the elements can be established.
First, it is possible to determine the phonemic distinctions in a language by a behavioral test that does not involve the specific meaning of words or the investigator's judgment of phonetic similarity. The test consists of one speaker of a language uttering, in random order, repetitions of two words (e.g., see and sea, or hard and heart) while the hearer (another speaker) judges which pronunciation are repetitions of which. (...) These discriminations create sound types, as against merely a scientist's aggregation of sound tokens, and the discriminations themselves constitute the discrete and definite ("phonemic") elements of of which everything else in language is constructed. (...)Phonemes are obtained by the most economical way of collecting into one element phonemic distinctions having different environments (...).
Next, it is possible to locate word boundaries within utterances of a language (and morpheme boundaries, e.g., affixes, within words) by a stochastic process -- that is, a process that checks the n+1th item given the first n items. (...) Each point at which the number of different possible next phonemes (or letters) peaks, i.e., at which the number is greater than immediately before or after, is (in most cases) a word or morpheme boundary. Such peaking arises because not all phoneme sequences make words and not all word sequences make sentences.
Finally, it is possible to locate the boundary of sentences within utterances. When sequences of words in utterances are studied, it is found that to a certain extent one can classify the local combinations into required ones and several kinds of possible ones. A complex stochastic process on the word sequence in an utterance, in respect to these types of successor classification, reveals a periodicity: at certain points, the successor possibilities return to the situation at the beginning of the utterance. These recurrent-event points segment the utterance into a succession of structurally independent sentences.
The stochastic process just mentioned are important even for a known language, where we know from experience what are the words and the sentences. First, the processes show that words and sentences exist, not merely by cultural convention or by some semantic properties but by restrictions of combinations, that is of occurrences relative to each other, in the physical components of speech. Secondly, they show that each type of entity is definable as a relation holding among entities of a smaller (more local) type. The phoneme sequence relation that makes words proves to be of little interest: it does not in general tie up with the other properties of words -- not with their meanings and not with the word sequence relation that makes sentences. However, the word sequence relations are explicit and are of decisive importance for the structure and meaning of sentences. It will be seen that these last relations are primarily a matter of the frequency of words relative to each other in utterances, or sentences, of a language.
1.3 Syntax Procedures
We are now ready to consider what combinations of words occur in the language, in contrast with those that do not. This cannot be done by simply listing them. First, the list would be too vast. Second, the set of sentences is not well defined: there are many marginal sentences about which speakers are not certain or do not agree about whether they are said at all, or are in the language. Third, languages changes, and no list would be correct over a sufficient period. Instead of listing, therefore, we try to find what constraints preclude the combinations that are not in the language, what restrictions affect the equiprobability of word occurrences in respect to each other in utterances of the language. It will be seen that there are three types of constraints on word combinations that make sentences, and that each one carries a type of meaning, so that the meaning of a sentence is determined directly from the words and the constraints. The three are: a partial ordering that creates sentencehood, a probability inequality that allows for word meaning, and a reducing of phonemic forms that does not affect the objective meaning.
1.4 The Partial-Order Constraint
The first constraint creates sentences structures. It is a partial order of words, that is (roughly) an ordering in which some words are higher or lower on some scale than others, while some are neither higher nor lower than others. The partial order holds between word occurrences in utterances. It determines all sentences, but it overt only in a subset from which, however, all other sentences can be derived. Grammatical relations can be defined in terms of it.
The partial order is a constraint on word combinations: it says that in the argument position next to a given operator the frequency (or probability) of certain words -- those not in the argument class for that operator -- is zero. Each satisfaction of the partial order, i.e., each word sequence in which all the source words have their requirement satisfied, is a sentence. (...) The partial-order relation has a meaning: as will be seen later, each operator is being said about its argument, so that the meaning of the partial order is roughly predication.
While the dependence is based on the word combinations in a particular body of data, it is intended to predict the combinations in any utterance of the language except insofar as transformations (the reductions in the third constraint) alter the shapes (and apparent presence) and positions of words.
Now, this dependence relation has an important property. If we ask what determines for each word which word class it requires as argument, we find that the required words are identified by what they in turn require. (...)
There must be, in the language and in each sentence, at least one zero-level argument that requires nothing, for otherwise one couldn't have any words in a sentence. There must also be at least one first-level operator that requeries only words that require null, for nothing else could enter a sentence after the zero-level words (...). And there would have to be second-level operators, at least one of whose requirements is a
first-level operator, if we are to have any sentences beyond the elementary ones.
Thus the relations that imposes the partial order is not just the dependence of word on a stated class of words, but the dependence of a word on the dependence property of words. This is the kind of relation that can define a system without recourse to any externally defined elements; it has the property of a mathematical object. Having come to this, it is worth noting that the language elements involved, namely words, have indeed no inherent property that has to be used for sentences construction. The sound of words are not related to their meaning or their combinability, and are even dispensed with in writing (especially in pictographic writing). Even the meanings of words, as will be noted in the third lecture, are in part determined from their combinations rather than purely from their identity. From the viewpoint of the partial order, then, the word occurrences from a set of arbitrary elements closed under the dependence-on-dependence relation, with every combination that satisfies this relation being a sentence.
The importance of this relations will be clearer when it is seen that almost all further class and operations in language, and almost all language meanings, are formulated on the constructions resulting from this relation. It should be noted that the operator-argument relation produced by this dependence has important similarities to functions in categorical grammar logic. The differences arise from the different purposes in a syntax of logic and a syntax of natural language.
1.5 The Likelihood Constraint
We have obtained the gross structure of sentences. It is still necessary to describe how a particular word is chosen for a sentence, how certain combinations are more likely than others. This is done by a second constraint, on word likelihood. Whereas the first constraint creates sentences structure, the second specifies word meanings. It does not necessarily create meaning, since many words with their meanings must have been in use singly before being used in sentences, but it specifies meaning to any detail desired, and it enables a word to extend its meaning, and to have different meanings in different operator-argument environment. The first constraint set probability = 0 for words outside the required class in argument position; this leaves room for any probability > 0 for words of the required class. Nothing says that all words must have equal frequency in respect to their operator or argument, or that the frequency must be random, or must fluctuate. In fact we find in language that each word has a particular and roughly stable likelihood of occurring as argument, or operator, with a given other word, though there are many cases of uncertainty, disagreement among speakers, and change through time. These roughly stable likelihoods, and especially the selection frequency, which will be mentioned in a moment, conform to and fix the meanings of words.
We speak here of likelihood under an operator (or over an argument), in the sense of estimated frequency or probability per fixed number of occurrences of that operator (or argument); no one has actually counted the frequencies of various words in argument position under another word. Nevertheless it should be noted that counting
such frequencies over a small sample of the language is not as impossibly vast a task as it might seem to be, and this because we are not speaking of frequency in respect to other words in arbitrary sentences but only in the word pairs or triples in operator-argument relation, which is the elementary sentential structure and the sentential component of all sentences, and which constitutes the great bulk of meaning-characterizing, roughly stable relative frequencies.
Each word has a somewhat fuzzy selection of other words that are more likely than average to occur in the position for its argument -- that is, more likely that would
be expected if the occurrences were random or equal in frequency. (...)
In addition there exist words with exceptionally high likelihood, and this on several different grounds. A word may have high likelihood as a total of many ordinary likelihoods, if it is in the selection of exceptionally many operators. (...)
Some high-likelihood word occurrences are recognized to be such by the very fact that they have been reduced. (...)
There are also words with exceptionally low likelihood in particular situations. (...)
1.6 The Reduction Constraint
The third constraint makes existing sentences more compact. It consists, for each language, of a few specifiable types of reduction, even to zero, in the phonemic shape of particular word occurrences. First, the domain of reduction: what is reducible is the high-likelihood (or otherwise favored) material. Certain words that have exceptionally high likelihood or special status in a given position are reducible. (...) The words that have highest likelihood, i.e., are expectable in a given environment contribute little or no information when they enter there, in the information-theoretic sense It is relevant that reduction takes place in several different high-likelihood and special status situations. This suggests that what determines reducibility is not simply high frequency but low information, which is the common property of all of these situations. Note that the ability of the hearer to supply the zeroed word shows that the to-be-zeroed occurrence of the word carried no further information that had to be given by the speaker.
This suggestion is supported by the fact that words that have exceptionally low likelihood in a particular operator environment ca, when they do occur there, block reductions that would otherwise take place there.
1.8 Properties of the Base
We have seen the constraints on word combinarion: the partial order of word dependence that created sentence structure, the likelihood inequalities that fit word meanings, the reduction of high-likelihood word occurrences, and finaly the linearizations. Each acts on the resultants of its predecessor. The constraints partition the set of sentences into two major sets. Without reduction, they create a base set from which all other sentences are derived. What is important here is that neither the base set not the other set, the derived (reduced) set, is merely a residue of the other. On one hand, the structure of the base set is not just a description of all those sentences that could not be derived from something. On the other hand, the derivations are not just any change needed to obtain the remaining senteces from the base set. Rather, the base set and the reductions each have simple and understandable structures on their own terms, and it is a notrivial result that the whole set of sentences is characterized by just these two structures.