A Usage-Based Account of Constituency and Reanalysis

Constituent structure is considered to be the very foundation of linguistic competence and often considered to be innate, yet we show here that it is derivable from the domain-general processes of chunking and categorization. Using modern and diachronic corpus data, we show that the facts support a view of constituent structure as gradient (as would follow from its source in chunking and categorization) and subject to gradual changes over time. Usage factors (i.e., repetition) and semantic factors both influence chunking and categorization and, therefore, influence constituent structure.
in this article we propose to derive constituent structure from the domain-general processes of chunking and categorization within the storage network for language. Because language is a dynamic system, an important part of our argument will rest on the idea that constituent structure, like all of grammar, is constantly undergoing gradual change. Thus, structural reanalysis, as often discussed in the context of grammaticalization, will be pivotal to our argument and exposition.
We mean by 'structural reanalysis' a change in constituent structure, as when 'to' as an earlier allative or infinitive marker with a verb as its complement fuses with 'going' in the future expression 'be going to' ('going [to see]' > '[going to] see'). Indicators of reanalysis include changes in distribution and phonological changes ('going to' -> 'gonna').
Are such changes abrupt or gradual? In generative models of syntax (see, e.g., Lightfoot, 1979; Roberts & Roussou, 2003), structural reanalysis is necessarily abrupt, because it is held that a sequence of words has a unique, discrete constituent analysis.1 In this view, constituents are clearly defined and do not overlap; in a sequence such as going to VERB, to must be grouped either with the following verb, or with going, with no intermediate stages. 
However, because most linguistic change appears to be quite gradual, with slowly changing meanings and distributions and overlapping stages, a problem arises for a theory with discrete constituent structure. Evidence from the gradualness of change has led some researchers to doubt discrete categories and structures (Haspelmath, 1998; Hoffmann, 2005; Quirk, Greenbaum, Leech, & Svartvik, 1985).
Continuing from Bybee and Scheibman (1999), we join these researchers in proposing that constituent structure can change gradually. We take the view that it is altogether common even for an individual speaker to have nondiscrete syntactic representations for the same word sequence. Taking a complex systems-based perspective, we hold that syntactic structure is in fact much richer than the discrete constituency view would indicate. There are multiple overlapping and, at times, competing influences on the shape of units in the grammar, and these multiple factors have an ongoing effect on each speaker’s synchronic representations of syntactic structure. Specifically, syntactic constituents are subject to ongoing influence from general, abstract patterns in language, in addition to more localized, item-specific usage patterns. The foregoing perspective makes it possible that the same word sequence may be characterized by multiple constituent structures and that these structures have gradient strengths rather than discrete boundaries. Our position in this article is thus that constituency may change in a gradual fashion via usage, rather than via acquisition, and that structural reanalysis need not be abrupt.
(...) constituent structure is gradient, mutable, and emergent from domain-general processes (...) 
chunking and categorization together provide constituency analyses of phrases and utterances for speakers

Constituent Structure as Emergent From Chunking, Categorization, and Generalization
Bybee (2002, in press) discusses the nature of sequential learning and chunking as it applies to the formation of constituents. Because members of the same constituent appear in a linear sequence with some frequency, these items are subject to chunking, by which sequences of repeated behavior come to be stored and processed as a single unit.

A chunk is a unit of memory organization, formed by bringing together a set of already formed chunks in memory and welding them together into a larger unit. Chunking implies the ability to build up such structures recursively, thus leading to a hierarchical organization of memory. Chunking appears to be a ubiquitous feature of human memory.
Chunking occurs automatically as behaviors are repeated in the same order, whether they are motor activities such as driving a car or cognitive tasks such as memorizing a list. Repetition is the factor that leads to chunking, and chunking is the response that allows repeated behaviors to be accessed more quickly and produced more efficiently (Haiman, 1994). Chunking has been shown to be subject to The Power Law of Practice (Anderson, 1993), which stipulates that performance improves with practice, but the amount of improvement decreases as a function of increasing practice or frequency. Thus, once chunking occurs after several repetitions, further benefits or effects of repetition accrue much more slowly.

Changes in Constituent Structure
In grammaticalization it often happens that grammaticalizing expressions change their constituent structure. Thus, it is often said that grammaticalization is the reanalysis of a lexical item as a grammatical item. As Haspelmath (1998) pointed out, often the change can be thought of as a simple change in a category status. Thus, a verb becomes an auxiliary; a serial verb becomes an adposition or a complementizer; a noun becomes a conjunction or adposition. In some cases, however, shifts in constituent boundaries do occur; in particular, it is common to lose some internal constituent boundaries. A prominent example of such a change involves complex prepositions. Many complex prepositions start out as a sequence of two prepositional phrases (e.g., on top of NP) but evolve into a kind of intermediate structure in some analyses—the complex preposition—and eventually they can even develop further into simple prepositions, as has occurred with beside, behind, and among (Hopper & Traugott, 2003; Ko·nig & Kortmann, 1991; Svorou, 1994).
Given the above proposed model, Hay (2001) reasoned that if the complex unit is more frequent than its parts, it is more likely to be accessed as a unit, leading to the loss of analyzability that comes about through categorization. Applied to 'in spite of', we would predict that as the complex phrase becomes more frequent than the simple noun 'spite', it would also become more autonomous and less analyzable.


We have taken stock here of the traditional discrete constituency view that holds that a word sequence either has a holistic structure or a unique, nested hierarchical structure. The accounts we have examined ultimately reject usage as an indicator of constituent structure—discarding evidence from semantics and any usage data that might be countered by partial evidence from introspective syntactic tests. Such a conservative approach rejects even the possibility of finding evidence that particular sequences may have reached an intermediate stage of constituency. Moreover, the discrete constituency view would seem to hold that grammar is driven only by abstract syntactic generalizations and is immune to any gradual effects from item-specific usage patterns.
In contrast, as we do not take constituent structure as given innately, we do not give priority to syntactic tests. Rather we consider data from usage, semantics, and language change. Indeed, we have shown that chunking and categorization have semantic effects and change incrementally over time.
Moreover, in keeping with the theory of complex adaptive systems, we consider constituent structure to be emergent from the domain-general processes of chunking and categorization. Human minds track multiple factors related to constituency, and this complex processing correlates with a rich and dynamic structural representation for word sequences. In our model, constituency is the result of interacting influences that are both local and global in nature. The global influences that help shape constituents correspond to general patterns in usage. On the other hand, constituency may also be shaped locally by item-specific forces over time. If a sequence is consistently used in a particular context (with complex prepositions like in spite of as a case in point), that sequence will gradually form into a unit, overriding general patterns elsewhere in usage. In this regard, we embrace Bolinger’s early complex systems view of language as a “jerry-built” and heterogeneous structure that is also intricate and tightly organized (1976, p. 1). Rather than assuming that structure is given a priori via top-down blueprints, we agree with Bolinger (1976) and Hopper (1987) that structure emerges locally and is subject to ongoing revision, even while general patterns exhibit apparent stability.

