quinta-feira, 4 de setembro de 2014

Hexagon Binning

Hexagon binning is a bivariate histogram useful for visualizing the structure of data when they depend on two random variables. A simpler model, considering only one variable may have unaddressed, correlated errors, leading them to look simpler than they should. This is problematic because it may suggest spurious regularity. This error is typical in fitting algorithms that assume that $x$ is known perfectly and only $y$ is measured with uncertainty.

The concept of hexagon binning is to tessellate the $xy$ plane over a certain range by a regular grid of hexagons. The number of data points falling in each bin is counted. The hexagons are plotted with color or radius varying in proportion to the observed data count in each bin. A hexagon tessellation is preferred over the square counterpart since hexagons have symmetry of nearest neighbors which is lacking in square bins. Moreover, hexagons are the polygon that can create a regular tessellation of the plane that have the largest number of sides. In terms of packing, a hexagon tessellation is 13% more efficient for covering the plane than squares. Hexagons are then less biased for displaying densities than other regular tessellations.

The counts observed are a result of the underlying statistical characteristics of the data, the tiling used to divide the domain and the limited sample taken from the population. Therefore, ragged patterns might appear where a continuous transition should take place. It is then usual to apply a smoothing over the binning counts to avoid this.

hexbin: Hexagonal Binning Routines in R

Hexagon Binning of Word Frequency

Analyzing the relation between word frequency and its rank has been a key object of study in quantitative linguistics for almost 80 years. It is well known that words occur according to a famously systematic frequency distribution known as Zipf's or Zipf-Mandelbrot law. The generalization proposed by Mandelbrot starts that the relation between rank ( $r$ ) and frequency ( $f$ ) is given by

$ f(r) = \frac{C}{(r + \beta)^\alpha} $

where $C$, $\beta$ and $\alpha$ are a constants.

The standard method to compute the word frequency distribution is to count the number of occurrences of each word and sort them afterwards according to their decreasing frequency of occurrence. The frequency $f(r)$ of the $r$ most frequent word is plotted against its rank $r$, yielding a roughly linear curve in a log-log plot. The frequency and rank are both estimated from the very same corpus, what could lead to correlated errors between them.  

Analyzing the example proposed by Wentian Li (1992), and also previously by George A. Miller (1957), we might observe that a problem might incur from the method described above to count and rank words. Words that are equally probable will, by chance, appear with different frequency count and therefore they will appear as a strikingly decreasing curve, suggesting an interesting relation between frequency and rank, that turns out to be more problematic for low-frequency words, whose frequencies are measured with lack of precision. It might be a spurious association created between an observed pattern and an underlying structure.

This unwelcome situation might be mitigated by using an extremely large corpus or by using two independent corpora to estimate both variables: frequency and rank.

Steven T. Piantadosi proposes:
Fortunately, the problem is easily fixed: We may use two independent corpora to estimate the frequency and frequency rank. In the above case where all words are equally probable, use of independent corpora will lead to no apparent structure -- just a roughly flat frequency-rank relationship. In general, we need not have two independent corpora from the start; we can imagine splitting our initial corpus into two subcorpora before any text processing takes place. This creates two corpora that are independent bodies of text (conditioned on the general properties of the starting corpus) and, so, from which we can independently estimate r and f(r). A convenient technique to perform this split is to perform a binomial split on observed frequency of each word: If we observe a word, say, 100 times, we may sample from a binomial (N = 100, p = .5) and arrive at a frequency of, say, 62 used to estimate its true frequency and a frequency of N - 62 = 38 to estimate its true frequency rank. This exactly mirrors randomly putting tokens of each word into two independent corpora, before any text processing began. The choice of p = .5 is not necessary but yields two corpora of approximately the same size. With this method, the deviations from a fit are interpretable, and our plotting method no longer introduces any erroneous structure.

Figure 1a shows such a plot, giving the frequency/frequency-rank relationship from the American National Corpus (ANC; Reppen & Ide, 2004), a freely available collection of written American English. All figures in this paper follow this plotting procedure unless otherwise noted. The plot shows a two-dimensional histogram of where words fall in frequency/frequency-rank space. 7 The shading of the histogram is done logarithmically with the number of words falling into each hexagonal bin and is white for zero-count bins. Because the plot has a logarithmic y-axis, words with zero frequency after the split are not shown. The fit of Eq. 2 using a maximum-likelihood method on the separate frequency and frequency rank portions of the corpus is shown in the red solid line. Additionally, a locally smoothed regression line (LOESS) (Cleveland, Grosse, & Shyu, 1992) is shown in gray. This line corresponds to a local estimate of the mean value of the data and is presented as a comparison point to see how well the fit of Eq. 2 matches the expected value of the points for each frequency rank (x-value). In the corner, several key values are reported: the fit α and β, an R 2 measure giving the amount of variance explained by the red line fit, and an adjusted R 2 adj capturing the proportion of explainable variance captured by the fit, taking the smoothed regression as an estimate of the maximum amount of variance explainable. For simplicity, statistics are computed only on the original R 2 , and its significance is shown with standard star notation (three stars means p < .001).
This plot makes explicit several important properties of the distribution. First, it is approximately linear on a log-log plot, meaning that the word frequency distribution is approximately a power law, and moreover, is fit very well by Eq. 2 according to the correlation measures. This plot shows higher variability toward the low-frequency end, (accurately) indicating that we cannot estimate the curve reliably for low-frequency words. While the scatter of points is no longer monotonic, note that the true plot relating frequency to frequency rank must be monotonic by definition. Thus, one might imagine estimating the true curve by drawing any monotonic curve through these data. At the low-frequency end, we have more noise and, so, greater uncertainty about the shape of that curve. This plot also shows that Eq. 2 provides a fairly accurate fit (red) to the overall structure of the frequency-rank relationship across both corpora.
Importantly, because we have estimated r and f(r) in a statistically independent way, deviations from the curve can be interpreted. Figure 1b shows a plot of these deviations, corresponding to the residuals of frequency once Eq. 2 is fit to the data. Note that if the true generating process were something like Eq. 2, the residuals should be only noise, meaning that those that are above and below the fit line (y = 0 in the residual plot) should be determined entirely by chance. There should be no observable structure to the residual plot. Instead, what Fig. 1b reveals is that there is considerable structure to the word frequency distribution beyond the fit of the Zipf–Mandelbrot equation, including numerous minima and maxima in the error of this fit. This is most apparent in the “scoop” on the right-hand size of the plot, corresponding to misestimation of higher ranked (lower-frequency) words. This type of deviation has been observed previously with other plotting methods and modeled as a distinct power law exponent by Ferrer i Cancho and Solé (2001), among others. 
However, what is more striking is the systematic deviation observed in the left half of this plot, corresponding to low-rank (high-frequency) words. Even the most frequent words do not exactly follow Zipf’s law. Instead, there is a substantial auto-correlation, corresponding to the many local minima and maxima (“wiggles”) in the left half of this plot. This indicates that there are further statistical regularities -- apparently quite complex -- that are not captured by Eq. 2. These autocorrelations in the errors are statistically significant using the Ljung-Box Q-test (Ljung & Box, 1978) for residual autocorrelation (Q = 126,810.1, p < .001), even for the most highly ranked 25 (Q = 5.7, p = .02), 50 (Q = 16.7, p < .001), or 100 (Q = 39.8, p < .001) words examined. 
Such a complex structure should have been expected: Of course, the numerous influences on language production result in a distribution that is complex and structured. However, the complexity is not apparent in standard ways of plotting power laws. Such complexity is probably incompatible with attempts to characterize the distribution with a simple parametric law, since it is unlikely that a simple equation could fit all of the minima and maxima observed in this plot. At the same time, almost all of the variance in frequencies is fit very well by a simple law like Zipf’s power law or its close relatives. A simple relationship captures a considerable amount about word frequencies but clearly will not explain everything. The distribution in language is only near-Zipfian.

Steven T. Piantadosi (2014). Zipf’s word frequency law in natural language: A critical review and future directions

sábado, 23 de março de 2013

20 steps to unlock and install Cyanogenmod on Xperia Mini Pro

20 steps to unlock and install cyanogenmod on Xperia Mini Pro

1. download Android SDK
extract it in you /tmp/ for example

2. Open the Phone application on the Xperia Mini Pro and enter *#06# to obtain the device's IMEI. Save this for later use.

3. Put the device into fastboot mode:
   a) Turn off your Sony device
   b) Press and hold the Volume Up button, at the same time plug in the micro USB cable which already connected with PC.
   c) You should see Blue Color LED light up.
   d) You are now in fastboot mode.

4. cd /tmp/adt-bundle-linux-x86_64-20130219/sdk/platform-tools

5. sudo ./fastboot -i 0x0fce getvar version
version: 0.3
finished. total time: 0.001s

6. http://unlockbootloader.sonyericsson.com/instructions
   a) Click the 'continue' button at the bottom of the page.
   b) Agree to the 'Are You Sure' and 'Legal Terms' prompts to continue.
   c) Enter the first 14 digits of your IMEI.
   d) You will receive you unlock boot loader key on your email.

7. In the PC's terminal, enter the following command:
   sudo ./fastboot -i 0x0fce oem unlock 0xKEY
   where KEY corresponds to the unlock code you were given.

$ sudo ./fastboot -i 0x0fce oem unlock 0xKEYKEYKEY
(bootloader) Unlock phone requested
(bootloader) Erasing block 0x00001300
(bootloader) Erasing block 0x00001400
(bootloader) Erasing block 0x00001500
(bootloader) Erasing block 0x00001600
(bootloader) Erasing block 0x00001700
(bootloader) Erasing block 0x00001800
(bootloader) Erasing block 0x00001900
(bootloader) Erasing block 0x00001a00
(bootloader) Erasing block 0x00001b00
(bootloader) Erasing block 0x00001c00
(bootloader) Erasing block 0x00001d00
(bootloader) Erasing block 0x00001e00
(bootloader) Erasing block 0x00001f00
OKAY [ 10.465s]
finished. total time: 10.465s

8. The Xperia Mini Pro's bootloader should now be unlocked.

9. Download Google Apps

10. Place the CyanogenMod rom .zip file on the root of the SD card.
    And also the supplemental Google App package

11. Extract the boot.img from the zip, you will need this file for fastboot.

12. Put the phone into fastboot mode again.

13. Open a terminal and enter the following:
    sudo ./fastboot -i 0xfce flash boot boot.img
    sudo ./fastboot -i 0xfce reboot
    While the device reboots, press the Volume rockers a few times to load recovery.

14. Select backup and restore to create a backup of the current installation on the Xperia Mini Pro.

15. Select the option to wipe data/factory reset.

16. Select Install zip from sdcard.

17. Select Choose zip from sdcard.

18. Select the CyanogenMod file you placed on the sdcard.

19. Install the Google Apps using the same method.

20. Once the installation has finished, return back to the main menu, and select the reboot system now option. The Xperia Mini Pro should now boot into CyanogenMod.

terça-feira, 19 de março de 2013

percentual of Hapax legomenon in English

Computing the percentage of Hapax legomenon through the Gutenberg's database.

Bellow follows a python script to get number of Hapax legomenon and total lexical size through 1000 randomly chosen books in Gutenberg. The printed is result is the percentage of Hapax legomenon in each text.

#!/usr/bin/env python

import random
import urllib2
import re
import os

numMinGuttenberg = 10001
numMaxGuttenberg = 42370
numRand = 1000

ftpurl = "ftp://ftp.ibiblio.org/pub/docs/books/gutenberg/"

for x in xrange(numRand):
   rndint = random.randint(numMinGuttenberg,numMaxGuttenberg)
      txturl = ftpurl + str(rndint)[0] + '/' + str(rndint)[1] + '/' + str(rndint)[2] + '/' + str(rndint)[3] + '/' + str(rndint) + '/' + str(rndint) + '.txt'
      os.system('wget -nv -q -U firefox -O /tmp/txt ' + txturl)
      os.system('./wordcount.sh /tmp/txt > /tmp/wcount')
      a=os.popen("grep -c ': 1' /tmp/wcount").read()
      b=os.popen("sed -n '$=' /tmp/wcount").read()
      print float(a)/float(b)
   except Exception, e:
      print e

The script above use the bash script called wordcount.sh

tr 'A-Z' 'a-z' < $1 | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | sed 's/[[:space:]]*\([0-9]*\) \([a-z]*\)/\2 : \1/' 

Run the script above and save the result to a text file, remove the lines where there was error in retrieving information and finally compute the average.

./hapaxlegomenon.py > hapaxlegomenon_results.txt

# remove lines with "could not blablabla"
sed -i '/could/d' hapaxlegomenon_results.txt

# compute average, min and max values
awk '{if(min==""){min=max=$1}; if($1>max) {max=$1}; if($1< min) {min=$1}; total+=$1; count+=1} END {print total/count, min, max}' hapaxlegomenon_results.txt

Results (from 788 texts):
min = 0.37550 max = 0.69534 avg = 0.54535 std = 0.045773

Intuitively we expect to observe a lower percentage of Hapax legomenon on a lexicon when dealing with rather less formal texts. In order to test it, we computed the percentage by using 18828 messages of Usenet newsgroups. The percentage of Hapax legomenon found in the lexicon was 0.49674. The code used follows bellow.

wget http://qwone.com/~jason/20Newsgroups/20news-18828.tar.gz
tar -C /tmp/ -xvzf 20news-18828.tar.gz
for file in $(find /tmp/20news-18828/ -type f ); do cat $file >> /tmp/20news-18828.txt; done
./wordcount.sh /tmp/20news-18828.txt > /tmp/20news-18828count.txt
# number of Hapax legomenon
grep -c ': 1' /tmp/20news-18828count.txt
# total number of lexical entries
sed -n '$=' /tmp/20news-18828count.txt

Maybe if we use a less formal data-set, which would better approach the natural spoken language, then we expect a lower value for the percentage of Hapax legomenon on the lexicon. In order to do so we used IRC logs. Some are archived for a record of communications concerning major events in the history. Logs were made during the Gulf War and Oklahoma City bombing, for example. These and other events are kept in the ibiblio archive. The script bellow was used and the surprising result is that the percentage found was 0.45714, what is not a huge drop as one would expect.

wget -r http://www.ibiblio.org/pub/academic/communications/logs/
rm /tmp/irc.txt
for file in $( ./findbymime.sh /tmp/irc/ "application/octet-stream" ); do cat $file >> /tmp/irc.txt; done
for file in $( ./findbymime.sh /tmp/irc/ "text/plain" ); do cat $file >> /tmp/irc.txt; done
./wordcount.sh /tmp/irc.txt > /tmp/irccount.txt
grep -c ': 1' /tmp/irccount.txt
sed -n '$=' /tmp/irccount.txt

segunda-feira, 4 de março de 2013

Word Frequency and Context of Use in the Lexical Diffusion of Phonetically Conditioned Sound Change

Word Frequency and Context of Use in the Lexical Diffusion of Phonetically Conditioned Sound Change

Joan Bybee

Lexical diffusion refers to the way that a sound change affects the lexicon. If sound change is lexically abrupt, all the words of a language are affected by the sound change at the same rate. If a sound change is lexically gradual, individual words undergo the change at different rates or different times. (...) One early contribution to this debate by Schuchardt (1885) is the observation that high-frequency words are affected by sound change earlier and to a greater extent than low-frequency words. (...) phonetically conditioned changes that affect high-frequency words before low-frequency words are best accounted for in an exemplar model of phonological representation that allows for change to be both phonetically and lexically gradual. (...) a word’s contexts of use also affect the rate of change. Words that occur more often in the context for change, change more rapidly than those that occur less often in that context. (...) sound changes can also progress more rapidly in high-frequency morphemes. (...) the contexts of use determine the rate at which a word or morpheme undergoes a sound change.

1. Regular sound change or lexical diffusion?

The hypothesis that sound change is lexically regular seems well supported by the
facts of change. When we observe that two languages or dialects exhibit a phono-
logical difference, it is very likely that this difference is regular across all the words
that have the appropriate phonetic environment. This observation is fundamental to
the comparative method; the establishment of genetic relations and the reconstruc-
tion of protolanguages are based on the premise that sound change affects all words
equally. Schuchardt (1885) was one of the detractors from this position. When he
observed sound change in progress, he noted that all words did not change at the
same rate and that the differences were not due to “dialect mixture,” as was often
claimed by the Neogrammarians, who supported the regularity position.

Labov (1981, 1994) He proposed two types of sound change: “regular sound change” is gradual, phonetically motivated, without lexical or grammatical conditioning, and not influenced by social awareness, whereas “lexical diffusion” change, such as the phenomena studied by Wang, is “the result of the abrupt substitution of one phoneme for another in words that contain that phoneme” (Labov 1994:542). According to Labov, this type of change occurs most often “in the late stages of internal change that has been differentiated by lexical and grammatical conditioning” (ibid.). Labov went so far as to propose that certain changes, such as the deletion of glides and schwa, would be regular changes, while the deletion of obstruents would show lexical diffusion.

(...) even gradual, phonetically conditioned change exhibits gradual lexical diffusion (...)

Hooper (1976) identified a lexical diffusion paradox. Reductive sound change tends to affect high-frequency words before low-frequency words, but analogical leveling or regularization tends to affect low-frequency words before high-frequency words.

2. Frequency effects on regular sound change

Sound changes that are complete can be identified as regular or not, depending upon whether they affected all lexical items existing at the time of the change. Ongoing changes cannot be designated as regular or not, since they are not complete. However, one can reference the typical characteristics of a change to project whether it will be regular or not. That is, a phonetically gradual change with clear phonetic conditioning falls into Labov’s first type, and thus we can project its regularity

2.1. American English t/d deletion

Consider the deletion of final /t/ and /d/ in American English, which occurs most commonly in words ending in a consonant plus /t/ or /d/, such as just, perfect, child, or grand. This much-studied variable process has been shown to be affected by the preceding and following consonant, with more deletion in a consonant environment; by grammatical status, with less deletion if the /t/ or /d/ is the regular past tense; and by social and age factors, with more deletion among younger, lower socioeconomic class speakers (Labov 1972; Neu 1980).

(...) I found that deletion occurred more in high-frequency words. (...)

3. Changes that affect low-frequency words first

As previously mentioned, Hooper (1976) noted a lexical diffusion paradox: sound change seems to affect high-frequency words first, but analogical change affects low-frequency words first. The first tendency has already been documented. The second tendency is evident in the fact that low-frequency verbs, such as weep/wept, leap/leapt, creep/crept, are regularizing, while high-frequency verbs with the same pattern show no such tendency: that is, keep/kept, sleep/slept, leave/left show no evidence of regularizing. Hooper (1976) argued that changes affecting high-frequency words first have their source in the automation of production, whereas changes affecting low-frequency words first are due to imperfect learning. In the latter category are changes that affect words that do not conform to the general patterns of the language. Such exceptional words can be learned and maintained in their exceptional form if they are of high frequency in the input and in general use. However, if their frequency of use is low, they may not be sufficiently available in experience to be acquired and entrenched. Thus they may be subject to changes based on the general patterns of the language.

4. Modeling phonetic and lexical gradualness

The view of lexical diffusion espoused by both Wang and Labov assumes that a change that diffuses gradually through the lexicon must be phonetically abrupt. This is a necessary assumption if one is to accept a synchronic phonological theory that has phonemic underlying representations. Words can change one by one only if the change is a substitution of phonemes in such a theory. The discovery that sound change can be both phonetically gradual and lexically gradual forces a different view of the mental representation of the phonology of words (Hooper 1981; Bybee 2000b). If subphonemic detail or ranges of variation can be associated with particular words, an accurate model of phonological representation must allow phonetic detail in the cognitive representation of words. A recent proposal is that the cognitive representation of a word can be made up of the set of exemplars that have been experienced by the speaker/hearer. Thus all phonetic variants of a word are stored in memory and organized into a cluster: exemplars that are more similar are closer to one another than to ones that are dissimilar, and exemplars that occur frequently are stronger than less frequent ones (Johnson 1997; Bybee 2000a, 2001; Pierrehumbert 2001). These exemplar clusters, which represent autonomous words, change as experience with language changes. Repeated exemplars within the cluster grow stronger, and less frequently used ones may fade over time, as other memories do.

Changes in the phonetic range of the exemplar cluster may also take place as language is used and new tokens of words are experienced. Thus the range of phonetic variation of a word can gradually change over time, allowing a phonetically gradual sound change to affect different words at different rates. Given a tendency for reduction during production, the phonetic representation of a word will gradually accrue more exemplars that are reduced, and these exemplars will become more likely to be chosen for production, where they may undergo further reduction, gradually moving the words of the language in a consistent direction. The more frequent words will have more chances to undergo online reduction and thus will change more rapidly. The more predictable words (which are usually also the more frequent ones) will have a greater chance of having their reduced version chosen, given the context, and thus will advance the reductive change more rapidly.

The exemplar clusters are embedded in a network of associations among words that map relations of similarity at all levels. Distinct words with similar phonetic properties are associated, as are words with shared semantic features. I have shown in (Bybee 1985, 1988) that morphemes and morphological relations in such a network emerge from parallel phonetic and semantic associations and that schemas or abstractions over relations of similarity can be formulated to account for the regularities and patterns evident in language use.


An important property of the exemplar model is the emphasis on words as storage units. Various models have proposed that even multi-morphemic words have lexical listing. Vennemann (1974) argued that appropriate constraints on syllable structure can only be applied to whole words, not to morphemes. The common objection to this proposal made in the 1970s was that the human brain does not have sufficient storage capacity for all the words of a language, especially a language with large morphological paradigms. This argument has now been dismissed with the discovery of the huge amount of detail that the brain is capable of recording. Moreover, newer conceptions of the lexicon, not as a list but as a network with tight interconnections, provide the insight that listing two related words, such as start, started, does not take up as much cognitive space as listing two unrelated words, such as start, flower (Bybee 1985). Thus connectionist models (Rumelhart and McClelland 1986) and analogical models (Skousen 1989, 1992; Eddington 2000) have storage of whole words with morphological relations emergent from the categorization involved in storage. In addition, the lexical diffusion data provide evidence that multi-morphemic words can have lexical storage. As we saw in table 11.3, high-frequency regular past-tense English verbs are more likely to have their final /t/ or /d/ deleted than are low-frequency regular verbs. In order for a frequency effect to accrue to a word, that word must exist in memory storage. Since multi-morphemic words evince frequency effects, they must be stored in the lexicon. (...)

5. The effect of frequency of use in context

Given that the exemplar model tracks tokens of use and the exemplar cluster changes according to the phonetic shape of these tokens, it follows that if the context of use affects the phonetic shape of a word, the exemplar cluster will change accordingly. The effect of context can be best exemplified in changes that take place around word or morpheme boundaries, where the segment affected by the change is sometimes in the context for the change and sometimes not. Timberlake (1978) called this an alternating environment. Since the exemplar model registers phonetic tokens, the probabilities inherent in the alternating environment affect the shape of the exemplar cluster. (...) The exemplar cluster, then, appears to reorganize itself, with the stronger exemplars being more frequently chosen for use than the less frequent ones despite the context.

Thus along with the general measure of frequency of use, the relative frequency of the immediate linguistic context of use can also affect the lexical diffusion of a sound change. Even holding frequency constant, a word that occurs more often in the right context for a change will undergo the change more rapidly than a word that occurs less often in the conditioning context.


8. Consequences for a usage-based theory 

The study of the diffusion of sound change in the lexicon contributes to a better understanding of the nature and causes of sound change. Changes that affect high-frequency words first are a result of the automation of production, the normal overlap and reduction of articulatory gestures that comes with fluency (Browman and Goldstein 1992; Mowrey and Pagliuca 1995). The strong directionality of such changes indicates that they are not the result of random variation, but that they stem from reduction processes resulting from repetition and the normal automation of motor activity. If a sound change does not proceed from the most frequent to the least frequent words, then we should seek its explanation in some other mechanisms of change.

Moreover, I have proposed a model in which variation and change are not external to the lexicon and grammar but inherent to it (Pierrehumbert 1994). Sound change is not rule addition—something that happens at a superficial level without any effect on the deeper reaches of grammar. Rather, lexical representations are affected from the very beginnings of the change. Indeed, they supply an ongoing record of the change since they track the details of the phonetic tokens experienced. Further evidence for sound change having an immediate impact on representation is the fact that sound changes are never reversed or undone (Cole and Hualde 1998; Bybee 2001). The morphological structure of words also plays a role from the initial stages of a change, but less because morphemes have some special status with respect to change and more because of the contexts in which they appear. Alternating contexts retard change, while uniform ones allow change to hurry ahead.

Effects of frequency and context demonstrate that language use has an effect on mental representations. In this view, representations and the grammatical structure that emerges from them are based on experience with language. New linguistic experiences are categorized in terms of already-stored representations, adding to the exemplar clusters already present and, at times, changing them gradually. Various levels of abstraction emerge as exemplars are categorized by phonological and semantic similarity—morphemes, words, phrases, and constructions can all be seen as the result of the categorization of linguistic experiences.

Complex Adaptive Systems and the Origins of Adaptive Structure: What Experiments Can Tell Us

Complex Adaptive Systems and the Origins of Adaptive Structure: 
What Experiments Can Tell Us

Language is a product of both biological and cultural evolution. Clues to the origins of key structural properties of language can be found in the process of cultural transmission between learners. Recent experiments have shown that iterated learning by human participants in the laboratory transforms an initially unstructured artificial language into one containing regularities that make the system more learnable and stable over time. Here, we explore the process of iterated learning in more detail by demonstrating exactly how one type of structure—compositionality—emerges over the course of these experiments. We introduce a method to precisely quantify the increasing ability of a language to systematically encode associations between individual components of meanings and signals over time and we examine how the system as a whole evolves to avoid ambiguity in these associations and generate adaptive structure.
the very fact that language persists through multiple repeated instances of usage can explain the origins of key structural properties that are universally present in language. Because of this, taking a complex adaptive systems perspective on language lifts the burden of explanation for these properties from a putative richly structured domain-specific substrate, of the sort assumed by much of generative linguistics (e.g., Chomsky, 1965).

Much of the work over the past 20 years or so in modeling the evolution of language has taken this complex adaptive systems perspective (see, e.g., Brighton, Smith, & Kirby, 2005; Kirby, 2002b; Steels, 2003, for review). One particular strand of work has focused on the adaptation of language through a repeated cycle of learning and use within and across generations, where adaptation is taken to mean a process of optimization or fitting of the structure of language to the mechanisms of transmission (Kirby, 1999).

(...) One of the ways in which a language can evolve to become more learnable is by becoming structured.

"alien" language... chain learning

First, by looking at the learning errors made between adjacent generations, it was shown that the languages in both conditions were being acquired significantly more faithfully toward the end of the chains than they were at the beginning. Second, this increase in learnability over time occurred as a result of the languages becoming more structured over time.


Kirby et al. (2008) found that the languages that emerge through a repeated cycle of learning and production in a laboratory setting show evidence of adaptation to the bottleneck placed on their transmission. Making even minor changes to the way in which language is culturally transmitted can produce radically different types of structures. Given only a bottleneck on transmission preventing a proportion of the language from being seen by the next generation, language can adapt in such a way that ensures that it is stably transmitted to future generations. However, this occurs at the expense of being able to uniquely refer to every meaning. When they introduced the additional pressure of having to use a unique signal for each meaning, the language once again adapted to cope with these new transmission constraints, this time by becoming compositional. Having a compositional system ensures that both signals and meanings survive the bottleneck.

Because the participants could not know which condition they were in, it is impossible that the resulting languages were intentionally designed as adaptive solutions to the transmission bottleneck. Rather, the best explanation for the result is that in these experiments, just as in the computational models, linguistic adaptation is an inevitable consequence of the transmission of linguistic variants under particular constraints on replication. The result is apparent design, but without an intentional designer.
It seems clear from all of this that, first, cultural transmission alone is capable of explaining the emergence of languages that exhibit that appearance of design and, second, experimental studies of the iterated learning of artificial languages are a potentially useful methodological tool for those interested in studying cultural evolution.

This article has extended previous work on iterated language learning experiments by showing, using data obtained from an earlier study, exactly how compositional structure emerges over time as a result of cultural transmission. Using a recently developed analytical technique that calculates the regularity of mapping between signal and meaning elements (Tamariz & Smith, 2008), we were able to precisely quantify changes in the language’s ability to systematically encode such associations between meaning and signal components. From this we were able to explain the amplification effect the bottleneck seems to
have on systematicity in language, arguing that the sampling of smaller subsets of the language for training input to the next generation tends to make weaker patterns that are not visible at the level of the entire language appear stronger locally.

Evolution of Brain and Language

Evolution of Brain and Language
Thomas Schoenemann

The evolution of language and the evolution of the brain are tightly interlinked. Language evolution represents a special kind of adaptation, in part because language is a complex behavior (as opposed to a physical feature) but also because changes are adaptive only to the extent that they increase either one’s understanding of others, or one’s understanding to others. Evolutionary changes in the human brain that are thought to be relevant to language are reviewed. The extent to which these changes are a cause or consequence of language evolution is a good question, but it is argued that the process may best be viewed as a complex adaptive system, in which cultural learning interacts with biology iteratively over time to produce language.

A full accounting of the evolution of language requires an understanding of the brain changes that made it possible. Although our closest relatives, the apes, have the ability to learn at least some critical aspects of language (Parker & Gibson, 1990), they never learn language as completely or as effortlessly as do human children. This means that there must be some important differences between the brains of human and nonhuman apes. A fair amount is known about the ways in which human brains differ from the other apes, and we know much about specific functions of different parts of the brain. These two fields of study, combined with an understanding of general evolutionary processes, allow us to draw at least the broad outlines of the evolutionary history of brain and language.

There is a complex interplay between language evolution and brain evolution. The existence of language presupposes a brain that allows it. Languages must, by definition, be learnable by the brains of children in each generation. Thus, language change (a form of cultural evolution) is constrained by the existing abilities of brains in each generation. However, because language is critical to an individual’s adaptive fitness, language also likely had a fundamental influence on brain evolution. Humans are particularly socially interactive creatures, which makes communication central to our existence. Two interrelated evolutionary processes therefore occurred simultaneously: Language adapted to the human brain (cultural evolution), while the human brain adapted to better subserve language (biological evolution). This coevolutionary process resulted in language and brain evolving to suit each other (Christiansen, 1994; Christiansen & Chater, 2008; Deacon, 1992).

The coevolution of language and brain can be understood as the result of a complex adaptive system. Complex adaptive systems are characterized by interacting sets of agents (which can be individuals, neurons, etc.), where each agent behaves in an individually adaptive way to local conditions, often following very simple rules. The sum total of these interactions nevertheless leads to various kinds of emergent, systemwide orders. Biological evolution is a prime example of a complex adaptive system: Individuals within a species (a “system”) act as best they can in their environment to survive, leading through differential reproduction ultimately to genetic changes that increase the overall fitness of the species. In fact, “evolution” can be understood as the name we give to the emergent results of complex adaptive systems over time. One can also view the brain itself as a complex adaptive system. This is because brain circuits are not independent of each other. Processing in one area affects processing in connected areas; therefore, processing changes in one area—whether due to biological evolution or learning—influence (and select for over evolutionary time) changes in other areas.

A number of neural systems relevant specifically to language interact with and influence each other in important ways. Syntax depends fundamentally on the structure of semantics, because the function of syntax is to code higher level semantic information (e.g., who did what to whom). Semantics in turn depends on the structure of conceptual understanding, which—as will be reviewed later—is a function of brain structure. These structures are in turn the result of biological adaptation: Circuits that result in conceptual understanding that is relevant and useful to a given individual’s (ever-changing) environmental realities will be selected for and will spread over evolutionary time.


Therefore, language evolution itself will be strongly constrained by pre-existing cognitive abilities within each generation. Changes affecting the perception of linguistically relevant signals would have been favored only to the extent that they increase the individual’s ability to perceive and rapidly process the acoustic signals already used by others for language. Changes affecting the production of linguistically relevant signals would be favored only to the extent that they could be understood by the preexisting perceptual abilities of others. Signals too complicated or subtle for others to process would not be adopted and, hence, mutations influencing them would not likely spread.


Classical Language Areas
Broca’s and Wernicke’s areas were the first cortical regions to be associated with specific linguistic abilities. Broca’s aphasics display nonfluent, effortful, and agrammatical speech, whereas Wernicke’s aphasics display grammatical but meaningless speech in which the wrong words (or parts of words) are used (Bear, Connors, & Paradiso, 2007; Damasio et al., 1993). Broca’s area is located in the posterior-inferior frontal convexity of the neocortex, whereas Wernicke’s area is localized to the general area where parietal, occipital, and temporal lobes meet. For most people, these areas are functional for language primarily in the left hemisphere.

Additional areas, adjacent to, but outside these classic language areas, appear to be important for these aspects of language processing as well. Broca’s and Wernicke’s aphasias (i.e., the specific types of language deficits themselves) are not exclusively associated with damage to Broca’s and Wernicke’s cortical areas (Dronkers, 2000). Damage to the caudate nucleus, putamen, and internal capsule (structures of the cerebral hemispheres that are deep to the cortex) also appear to play a role in Broca’s aphasia, including aspects of syntactic processing (Lieberman, 2000).
The evolutionary histories of these areas are quite curious, as homologues to both Broca’s and Wernicke’s areas have been identified in nonhuman primate brains (Striedter, 2005). Exactly what function they play in other species is not currently known, but an evolutionary perspective would predict that they likely process information in ways that would be useful to language (Schoenemann, 2005), consistent with the view of language adapting to the human brain by taking advantage of circuits that already existed. The presence of these areas in nonlinguistic animals is a glaring anomaly for models that emphasize the evolution of completely new language-specific circuits in the human lineage (e.g., Bickerton, 1990; Pinker, 1995). In any case, although detailed quantitative data on these areas in nonhuman primates have not been reported, it does appear that they are significantly larger both in absolute and relative terms in humans as compared to macaque monkeys (Petrides & Pandya, 2002; Striedter, 2005).
Given that Broca’s and Wernicke’s areas mediate different but complementary aspects of language processing, they must be able to interact. A tract of nerve fibers known as the arcuate fasciculus directly connects these areas (Geschwind, 1974). The arcuate fasciculus in humans tends to be larger on the left side than on the right side, consistent with the lateralization of expressive language processing to the left hemisphere for most people (Nucifora, Verma, Melhem, Gur, & Gur, 2005).
The arcuate fasciculus appears to have been elaborated in human evolution. The homologue of Wernicke’s area in macaque monkeys does project to prefrontal regions that are close to their homologue of Broca’s area, but apparently not directly to it (Aboitiz & Garcia, 1997). Instead, projections directly to their homologue of Broca’s area originate from a region just adjacent to their homologue of Wernicke’s area (Aboitiz & Garcia, 1997). Thus, there appears to have been an elaboration and/or extension of projections to more directly connect Broca’s and Wernicke’s areas over the course of human (or ape) evolution. Recent work using diffusion tensor imaging (which delineates approximate white matter axonal connective tracts in vivo) suggest that both macaques and chimpanzees have tracts connecting areas in the vicinity of Wernicke’s area to regions in the vicinity of Broca’s area (Rilling et al., 2007). However, connections between Broca’s area and the middle temporal regions (important to semantic processing; see below) are only obvious in chimpanzees and humans and appear to be most extensive in humans (Rilling et al., 2007). Presumably these connections were elaborated during human evolution specifically for language (Rilling et al., 2007).

Prefrontal Cortex

Areas in the prefrontal cortex (in addition to Broca’s area) appear to be involved in a variety of linguistic tasks, including various semantic aspects of language (Gabrieli, Poldrack, & Desmond, 1998; Kerns, Cohen, Stenger, & Carter, 2004;
Luke, Liu, Wai, Wan, & Tan, 2002; Maguire & Frith, 2004; Noppeney & Price, 2004; Thompson-Schill et al., 1998), syntax (Indefrey, Hellwig, Herzog, Seitz, & Hagoort, 2004; Novoa & Ardila, 1987), and higher level linguistic processing, such as understanding the reasoning underlying a conversation (Caplan & Dapretto, 2001).

Right Hemisphere

Although the cortical language areas discussed so far are localized to the left hemisphere in most people, there is substantial evidence that the right hemisphere also contributes importantly to language. The right hemisphere understands short words (Gazzaniga, 1970) and entertains alternative possible meanings for particular words (Beeman & Chiarello, 1998), suggesting that it is better able to interpret multiple intended meanings of a given linguistic communication. The right hemisphere also plays a greater role in a variety of types of spatial processing in most people (Tzeng & Wang, 1984; Vallar, 2007), thus presumably grounding the semantics of spatial terms. The right frontal lobe mediates aspects of prosody (Alexander, Benson, & Stuss, 1989; Novoa & Ardila, 1987), which is critically important to understanding intended meaning (consider sarcasm, in which the intended meaning is directly opposite the literal meaning).


The primary function of the cerebellum was long thought to be monitoring and modulating motor signals from the cortex (Carpenter & Sutin, 1983). However, more recent work has implicated the cerebellum in a whole range of higher cognitive functions, including goal organization and planning, aspects of memory and learning, attention, visuo-spatial processing, modulating emotional responses, and language (Baillieux, De Smet, Paquier, De Deyn, & Marien, 2008). The cerebellum appears to play a role in speech production and perception, as well as both semantic and grammatical processing (Ackermann, Mathiak, & Riecker, 2007; Baillieux et al.; De Smet, Baillieux, De Deyn, Marien, & Paquier, 2007). The cerebellum also seems to play a role in timing mechanisms generally (Ivry & Spencer, 2004), which may explain its functional relevance to language (given the importance temporal information plays in language production and perception).


Many evolutionary changes in the brain appear to have relevance to language evolution. The increase in overall brain size paved the way for language both by encouraging localized cortical specialization and by making possible increasingly complicated social interactions. Increasing sociality provided the central usefulness for language in the first place and drove its evolution. Specific areas of the brain directly relevant to language appear to have been particularly elaborated, especially the prefrontal cortex (areas relevant to semantics and syntax) and the temporal lobe (particularly areas relevant to connecting words to meanings and concepts). Broca’s and Wernicke’s areas are not unique to human brains, but they do appear to have been elaborated, along with the arcuate fasciculus connecting these areas. Other areas of the brain that participate in language processing, such as the basal ganglia and cerebellum, are larger than predicted based on overall body weight, although they have not increased as much as a number of language-relevant areas of the cortex. Finally, little evidence suggests that significant elaboration of the auditory processing pathways up to the cortex has occurred, but direct pathways down to the tongue and respiratory muscles have been strengthened, with new direct pathways created to the larynx, presumably specifically for speech.

These findings are consistent with the view that language and brain adapted to each other. In each generation, language made use of (adapted to) abilities that already existed. This is consistent with the fact that the peripheral neural circuits directly responsible for perceptual and productive aspects of language have shown the least change. It makes sense that languages would evolve specifically to take advantage of sound contrasts that were already (prelinguistically) relatively easy to distinguish. This perspective is also consistent with the fact that Broca’s and Wernicke’s areas are not unique to humans. Differences in language circuits seem mostly to be quantitative elaborations, rather than completely new circuitry.
Three major factors seem to have conspired to drive the evolution of language: first, the general elaboration of—and increasing focus on—the importance of learned behavior; second, a significant increase in the complexity, subtlety, and range of conceptual understanding that was possible; and third, an increasingly complex, socially interactive existence. Each of these is reflected by a variety of changes in the brain during human evolution. Because language itself facilitates thinking and conceptual awareness, language evolution would have been a mutually reinforcing process: Increasingly complicated brains led to increasingly rich and varied thoughts, driving the evolution of increasingly complicated language, which itself facilitated even more complex conceptual worlds that these brains would then want to communicate (Savage-Rumbaugh & Rumbaugh, 1993; Schoenemann, 2009). The interplay between internal (conceptual) and external (social) aspects of human existence that drove this coevolutionary process highlights the usefulness of thinking about language evolution as a complex adaptive system. The extent to which increasing conceptual complexity itself might have driven language evolution represents an intriguing research question for the future.

A Usage-Based Approach to Recursion in Sentence Processing 

A Usage-Based Approach to Recursion in Sentence Processing 
Morten H. Christiansen - Cornell University
Maryellen C. MacDonald - University of Wisconsin-Madison

Most current approaches to linguistic structure suggest that language is recursive, that recursion is a fundamental property of grammar, and that independent performance constraints limit recursive abilities that would otherwise be infinite. (...) recursion is construed as an acquired skill and in which limitations on the processing of recursive constructions stem from interactions between linguistic experience and intrinsic constraints on learning and processing.


Ever since Humboldt (1836/1999, researchers have hypothesized that language makes “infinite use of finite means.” Yet the study of language had to wait nearly a century before the technical devices for adequately expressing the unboundedness of language became available through the development of recursion theory in the foundations of mathematics (cf. Chomsky, 1965). Recursion has subsequently become a fundamental property of grammar, permitting a finite set of rules and principles to process and produce an infinite number of expressions.


This article presents an alternative, usage-based view of recursive sentence structure, suggesting that recursion is not an innate property of grammar or an a priori computational property of the neural systems subserving language. Instead, we suggest that the ability to process recursive structure is acquired gradually, in an item-based fashion given experience with specific recursive constructions. In contrast to generative approaches, constraints on recursive regularities do not follow from extrinsic limitations on memory or processing; rather they arise from interactions between linguistic experience and architectural constraints on learning and processing (see also Engelmann & Vasishth, 2009; MacDonald & Christiansen, 2002), intrinsic to the system in which the
knowledge of grammatical regularities is embedded. Constraints specific to particular recursive constructions are acquired as part of the knowledge of the recursive regularities themselves and therefore form an integrated part of the representation of those regularities.

A Connectionist Model of Recursive Sentence Processing

Our usage-based approach to recursion builds on a previously developed Simple Recurrent Network (SRN; Elman, 1990) model of recursive sentence processing (Christiansen, 1994; Christiansen & Chater, 1994). The SRN, as illustrated in Figure 1, is essentially a standard feed-forward network equipped with an extra layer of so-called context units. The hidden unit activations from the previous time step are copied back to these context units and paired with the current input. This means that the current state of the hidden units can influence the processing of subsequent inputs, providing the SRN with an ability to deal with integrated sequences of input presented successively.

Usage-Based Constituents

A key question for connectionist models of language is whether they are able to acquire knowledge of grammatical regularities going beyond simple co-occurrence statistics from the training corpus. Indeed, Hadley (1994) suggested that connectionist models could not afford the kind of generalization abilities necessary to account for human language processing (see Marcus, 1998, for a similar critique). Christiansen and Chater (1994) addressed this challenge using the SRN from Christiansen (1994).

Deriving Novel Predictions

Simple Recurrent Networks have been employed successfully to model many aspects of psycholinguistic behavior, ranging from speech segmentation (e.g., Christiansen, Allen, & Seidenberg, 1998; Elman, 1990) and word learning (e.g., Sibley, Kello, Plaut, & Elman, 2008) to syntactic processing (e.g., Christiansen, Dale, & Reali, in press; Elman 1993; Rohde, 2002; see also Ellis & Larsen-Freeman, this issue) and reading (e.g., Plaut, 1999). Moreover, SRNs have also been shown to provide good models of nonlinguistic sequence learning (e.g., Botvinick & Plaut, 2004, 2006; Servan-Schreiber, Cleeremans, & McClelland, 1991). The human-like performance of the SRN can be attributed to an interaction between intrinsic architectural constraints (Christiansen & Chater, 1999) and the statistical properties of its input experience (MacDonald & Christiansen, 2002). By analyzing the internal states of SRNs before and after training with right-branching and center-embedded materials, Christiansen and Chater found that this type of network has a basic architectural bias toward locally bounded dependencies similar to those typically found in iterative recursion. However, in order for the SRN to process multiple instances of iterative recursion, exposure to specific recursive constructions is required. Such exposure is even more crucial for the processing of center-embeddings because the network in this case also has to overcome its architectural bias toward local dependencies. Hence, the SRN does not have a built-in ability for recursion, but instead it develops its human-like processing of different recursive constructions through exposure to repeated instances of such constructions in the input.