leolca's blog: 2011

quinta-feira, 31 de março de 2011

Segments Inventories and Syllable Inventories

Another hypothesis is that the size of the segment inventory is related to the phonotactics of the language in such a way as to limit the total number of possible syllables that can be constructed from the segments and suprasegmental properties that it has. Languages might then have approximately equal numbers of syllables even though they differ substantially in the number of segments. Rough maintenance of syllable inventory size is evisaged as the functional of cyclic historical processes by, for example, Matisoff (1973). He outlines an imaginary language in which, at some arbitrary stating point, "the number of possible syllables is very large since there is a rich system of syllable-initial and -final consonants". At a later stage of the language these initial and final consonantal systems are found to have simplified but "the number of vowels has increased and lexically contrastive tones have arisen" maintaining contrasting syllabic possibilities. If tone or vowel contrasts are lost, consonat clustering will increase at the syllable margins again.

A brief investigation of the relationship between segmental inventory size and syllable inventory size was carried out by calculating the number of possible syllables in 9 languages. The languages are Tsou (418), Quechua (819), Thai (400), Rotokas (625), Gã (117), Hawaiian (424), Vietnamese (303), Cantonese, Higi, and Yoruba (the last three are not in UPSID but detailed data on the phonotactics are available in convenient form for these languages). The 9 languages range from those with small segment inventories (Rotokas, Hawaiian) to those with relatively large inventories (Vietnamese, Higi, Quechua) and from those with relatively simple suprasegmental properties (Tsou, Hawaiian, Quechua) to those with complex suprasegmental phenomena (Yoruba, Thai, Cantonese, Vietnamese). In calculating the number of possible syllables, general co-occirrence restrictions were taken into account, but the failure of a particular combination of elements to be attested if parallel combinations were permited is taken only as evidence of an accidental gap, and such a combination is counted as a possible syllable. The calculations reveal very different numbers of possible syllables in these languages. The totals are given in Table 1.5.

Table 1.5 Syllable inventory size of 9 selected languages
Language	Total possible syllables
Hawaiian	162
Rotokas	350
Yoruba	582
Tsou	968
Gã	2,331
Cantonese	3,456
Quechua	4,068
Vietnamese	14,430
Thai	23,638

Even with the uncertainties involved in this kind of counting, the numbers differ markedly enough for the conclusion to be drawn that language are not strikingly similar in terms of the size of their syllable inventories.

In following up this study, several tests were done to see which of a number of possible predictors correlated best with syllable inventory size. The predictors used were the number of segments, the number of vowels, the number of consonants, the number of permitted syllable structures (CV, CVC, CCVC, etc.), the number of suprasegmental contrasts (e.g. number of stress levels time number of tone), and a number representing a maximal count of segmental differences in which the number of vowels was multiplied by the number of suprasegmentals. Of these, the best predictor is the number of permitted syllable types (r = .69), an indication that the phonotactic possibilities of the language are the most important factor contributing to the number of syllables. The next best predictor is the number of suprasegmentas (r = .59), with the correlation with the various segmental counts all being somewhat lower. Although all the predictors tested show a positive simple correlation with the number of syllables, in a multiple regression analysis only the number of vowels contributes a worthwhile improvement to the analysis (r^2 change = .19) beyond the number of syllable types. Thus we can say that syllable inventory size does not depend heavily on segment inventory size. Nonetheless, because the predictors do have positive correlations with syllable inventory size, the picture is once
again of a tendency for complexity of different types to go together.

(Patterns of Sounds, Ian Maddieson)

Segments and Suprasegmentals

Despite the failure to find any confirmation of a compensation hypothesis in several tests involving segmental subinventories, it is possible that the compensation exists at another level. One possibility was evidently in the minds of Firchow and Firchow (1969). In their paper on Rotokas (625), which has an inventory of only 11 segments, they remark that "as the Rotokas segmental phonemes are simple, the suprasegmentals are complicated". A similar view of a compensatory relationship between segmental and suprasegmental complexity seems implicit in much of the literature on the historical development of tone. For example, Hombert, Ohala and Ewan (1979) refer to "the development of contrastive tones on vowels because of the loss of a voicing distinction on obstruents". If this phenomenon is part of a pervasive relationship of compensation we would expect that, in general, languages with larger segmental inventories would tend to have more complex suprasegmental characteristics.

In order to test this predictions, the languages in UPSID which have less than 20 or more than 45 segments were examined to determine if the first group had obviously more complex patterns of stress and tone thatn the second. Both groups contain 28 languages. The findings on the suprasegmental properties of these languages, as far as they cam be ascertained, are summarized in Table 1.4.

Despite some considerable uncertainty of interpretation and the incompleteness of the data, the indications are quite clear that these suprasegmental properties are not more elaborate in the languages with simpler segmental inventories. If anything, they tend to be more elaborate in the languages with larger inventories.

There are more "large" languages with contrastive stress and with complex tone systems (more than 2 tones) that "small" languages. There are more "small" languages lacking stress and tone. The overall tendency appears once againn to be more that complexity of different kinds goes hand in hand, rather than for complexity of one sort to be balanced by simplicity elsewhere.

(Patterns of Sounds, Ian Maddieson)

quarta-feira, 30 de março de 2011

Relationship between Size and Structure

"The data in UPSID have been used to address the question of the relationship between the size of an inventory and its membership. The total number of consonants in an inventory varies between 6 and 95 with a mean of 22.8. The total number of vowels varies between 3 and 46 with a mean of 8.7. The balance between consonants and vowels within an inventory was calculated by dividing the number of vowels by the number of consonants. The resulting ratio varies between 0.065 and 1.308 with a mean of 0.402. The median value of this vowel ratio is about 0.36; in other words, the typical language has less than half as many vowels as it has consonants. There are two important trends to observe; larger inventories tend to be more consonant-dominated, but there is also a tendency for the absolute number of vowels to be larger in the languages with larger inventories. The first is shown by the fact that the vowel ratio is inversely correlated with the number of consonants in an inventory (r=-0.4, p=0.0001) and the second by the fact that the total of vowels is positively correlated with the consonant total (r=0.38, p=0.0001). However, a large consonant inventory with a small vowel inventory is certainly possible, as, for example, in Haida (700: 46C, 3V), Jaqaru (820: 38C, 3V) or Burushaski (915: 38C, 5V). Small consonant inventories with a large number of vowels seem the least likely to occur (cf. the findings of Hockett 1955), although there is something of an areal/genetic tendency in this direction in New Guinea languages such as Pawaian (612: 10C, 12V), Daribi (616: 13C, 10V) and Fasu (617: 11C, 10V). In these cases a small number of consonants is combined with a contrast of vowel nasality. Despite some aberrant cases, however, there is a general though weak association between overall inventory size and consonant/vowel balance: larger inventories tend to have a greater proportion of consonants."
(Patterns of Sounds, Ian Maddieson)

I made a graphic to show the relation between the number of vowels and consonants in a speech inventory. Each language in the UPSID is represented as a cross in the plot and the gray shading is the density of languages in the vowel-consonant plan.

Speech Inventories

"Such an association suggests that inventory size and structure may be related in other ways as well. A simple form of such a hypothesis would propose that segment inventories are structured so that the smallest inventories contain the most frequent segments, and as the size of the inventory increases, segments are added in descending order of their overall frequency of occurrence. If this were so, all segments could be arranged in a single hierarchy. Such an extreme formulation is not correct, since no single segment is found in all languages. But if we add a corollary, that larger inventories tend to exclude some of the most common segments, then there is an interesting set of predictions to investigate. We may formulate these more cautiously in the following way: a smaller inventory has a greater probability of including a given common segment than a larger one, and a larger inventory has a greater probability of including an unusual segment type than a smaller one."
(Patterns of Sounds, Ian Maddieson)

I would say that there is a convergence towards a inventory size with a number of segments between 20 and 40 and using a restricted set o segments that tends to be common among languages those languages. As the language gets further away from this zone, we cannot say much about what would be its inventory.

terça-feira, 29 de março de 2011

Universality of Language

Edwar Sapir wrote in 1921: "There is no more striking general fact about language than its universality. One may argue as to whether a particular tribe engages in activities that are worthy of the name of religion or of art, but we know of no people that is not possessed of a fully developed language. The lowliest South African Bushman speaks in the forms of a rich symbolic system that is in essence perfectly comparable to the speech of the cultivated Frenchman."

quarta-feira, 16 de março de 2011

Shepard Tone

Roger Shepard created an amusing auditory paradox, which is after him named: Shepard tone. It is based on a self-similar sequence of notes that consist on a superposition of 12 tones, each an octave higher then the lower neighbor. The 12 tones extend from the lower frequency limit of auditory perception to the upper frequency limit of hearing. In the present approach it goes from 10 Hz to 20,480 Hz, with all octave tones interleaved. The set of frequencies composing our Shepard tone is 10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10,240, and 20,480 Hz. In order to create a paradoxal sound of a continouos ingreasing frequency sound, we create a sweep, an exponential chirp for each composing tone, going from its initial frequency to its following octave.

The instantaneous frequency of each tone is given by
$f(t) = f_0 k^t$.
As we want the final frequency after $T$ seconds to be $2 f_0$,
$f(T) = 2 f_0 = f_0 k^T$
That leads to
$k = e^{\frac{\ln 2}{T}}$
and the instantaneous frequency may be written as
$f(t) = f_0 e^{\frac{\ln 2}{T} t}$
An exponential chirp tone $x(t)$ is given by
$x(t) = \sin \left( 2 \pi \int_{0}^{t} f(\tau) d\tau \right)$
$x(t) = \sin \left( 2 \pi f_0 \frac{e^{\frac{\ln 2}{T}t}-1}{\ln 2 / T} \right)$

To acchieve a better result I have weighted the tones using a gaussian window, and I created the chirp tone with the double sampling frequency and in the end I made a resample to the desired sampling frequency, to attenuate the aliasing caused by the higher chirping tones.


fs = 44100;
fs2 = 2*fs;
ts = 1/fs2;
F = [10 20 40 80 160 320 640 1280 2560 5120 10240 20480];
w = gausswin(length(F));
x = [];
T = 10;
t = [0:1/fs2:T-1/fs2]';
x = zeros(size(t));
w(end+1)=0;
for i = 1 : length(F),
    x += linspace(w(i),w(i+1),length(t))' .* sin(2*pi*F(i)*(exp(log(2)/T * t)-1)/(log(2)/T));
end;
x = resample(x,1,2);

I used the function bellow to concatenate several chirping tones:


function y = concatenate(varargin)
y = varargin{1};
for i = 2 : nargin,
    x = varargin{i};
    o1 = linspace(1,0,11025)';
    o2 = linspace(0,1,11025)';
    y = [y(1:end-11025); o1.*y(end-11025+1:end)+o2.*x(1:11025); x(11025+1:end)];
end;

And the result is plotted in a Spectogram (see figure bellow) and might also be heard.


y = concatenate(x,x,x);
specgram(y,[],fs,[],[]);
soundsc(y,fs);

terça-feira, 1 de fevereiro de 2011

Power Law Exponent

One main property of these fractals (or another way to express their main property, scalability) is that the ratio of two exceedances is going to be the ratio of the two numbers to the negative power of the power exponent.

Let us illustrate this. Say that you "think" that only 96 books a year will sell more than 250,000 copies (which is what happened last year), and that you "think" that the exponent is around 1.5. You can extrapolate to estimate that around 34 books will sell more than 500,000 copies -- simply 96 times (500,000/250,000)^(-1.5). We can continue, and note that around 8 books should sell more than a million copies, here 96 times (l,000,000/250,000)^(-1.5).

(...)

Table 2 illustrates the impact of the highly improbable. It shows the contributions of the top 1 percent and 20 percent to the total. The lower the exponent, the higher those contributions. But look how sensitive the process is: between 1.1 and 1.3 you go from 66 percent of the total to 34 percent. Just a 0.2 difference in the exponent changes the result dramatically -- and such a difference can come from a simple measurement error. This difference is not trivial: just consider that we have no precise idea what the exponent is because we cannot measure it directly. All we do is estimate from past data or rely on theories that allow for the building of some model that would give us some idea -- but these models may have hidden weaknesses that prevent us from blindly applying them to reality.

(The Black Swan, Nassim Nicholas Taleb)

Scale Invariance

Emily Dickinson

A word is dead

A word is dead
When it is said,
Some say.

I say it just
Begins to live
That day.

I'm Nobody! Who are you?

I'm Nobody! Who are you?
Are you -- Nobody -- Too?
Then there's a pair of us!
Don't tell! they'd advertise -- you know!

How dreary -- to be -- Somebody!
How public -- like a Frog --
To tell one's name -- the livelong June --
To an admiring Bog!

There is another sky

There is another sky,
Ever serene and fair,
And there is another sunshine,
Though it be darkness there;
Never mind faded forests, Austin,
Never mind silent fields -
Here is a little forest,
Whose leaf is ever green;
Here is a brighter garden,
Where not a frost has been;
In its unfading flowers
I hear the bright bee hum:
Prithee, my brother,
Into my garden come!

The Geometry of Nature

Triangles, squares, circles, and the other geometric concepts that made many of us yawn in the classroom may be beautiful and pure notions, but they seem more present in the minds of architects, design artists, modern art buildings, and schoolteachers than in nature itself. That's fine, except that most of us aren't aware of this. Mountains are not triangles or pyramids; trees are not circles; straight lines are almost never seen anywhere. Mother Nature did not attend high school geometry courses or read the books of Euclid of Alexandria. Her geometry is jagged, but with a logic of its own and one that is easy to understand.

(The Black Swan, Nassim Nicholas Taleb)

sexta-feira, 28 de janeiro de 2011

The Great Asymmetry

(...)

Indeed, the notion of asymmetric outcomes as the central idea of this book: I will never get to know the unknown since, by definition, it is unknown. However, I can always guess how it might affect me, and I should base my decisions around that.

This idea is often erroneously called Pascal's wager, after the philosopher and (thinking) mathematician Blaise Pascal. He presented it something like this: I do not know whether God exists, but I know that I have nothing to gain from being an atheist if he does not exist, whereas I have plenty to lose if he does. Hence, this justifies my belief in God.

Pascal's argument is severely flawed theologically: one has to be naïve enough to believe that God would not penalize us for false belief. Unless, of course, one is taking the quite restrictive view of a naive God. (Bertrand Russell was reported to have claimed that God would need to have created fools for Pascal's argument to work.)

But the idea behind Pascal's wager has fundamental applications outside of theology. It stands the entire notion of knowledge on its head. It eliminates the need for us to understand the probabilities of a rare event (there are fundamental limits to our knowledge of these); rather, we can focus on the payoff and benefits of an event if it takes place. The probabilities of very rare events are not computable; the effect of an event on us is considerably easier to ascertain (the rarer the event, the fuzzier the odds). We can have a clear idea of the consequences of an event, even if we do not know how likely it is to occur. I don't know the odds of an earthquake, but I can imagine how San Francisco might be affected by one. This idea that in order to make a decision you need to focus on the consequences (which you can know) rather than the probability (which you can't know) is the central idea of uncertainty. Much of my life is based on it.

(...)

(The Black Swan, Nassim Nicholas Taleb)

Backwards Narrative

Philosophers since Aristotle have taught us that we are deep-thinking animals, and that we can learn by reasoning. It took a while to discover that we do effectively think, but that we more readily narrate backward in order to give ourselves the illusion of understanding, and give a cover to our past actions. The minute we forgot about this point, the "Enlightenment" came to drill it into our heads for a second time.

(The Black Swan, Nassim Nicholas Taleb)

terça-feira, 25 de janeiro de 2011

We Just Can't Predict

When I ask people to name three recently implemented technologies that most impact our world today, they usually propose the computer, the Internet, and the laser. All three were unplanned, unpredicted, and unappreciated upon their discovery, and remained unappreciated well after their initial use. They were consequential. They were Black Swans. O f course, we have this retrospective illusion of their partaking in some master plan. You can create your own lists with similar results, whether you use political events, wars, or intellectual epidemics.

You would expect our record of prediction to be horrible: the world is far, far more complicated than we think, which is not a problem, except when most of us don't know it. We tend to "tunnel" while looking into the future, making it business as usual, Black Swan-free, when in fact there is nothing usual about the future. It is not a Platonic category!

We have seen how good we are at narrating backward, at inventing stories that convince us that we understand the past. For many people, knowledge has the remarkable power of producing confidence instead of measurable aptitude. Another problem: the focus on the (inconsequential) regular, the Platonification that makes the forecasting "inside the box."

I find it scandalous that in spite of the empirical record we continue to project into the future as if we were good at it, using tools and methods that exclude rare events. Prediction is firmly institutionalized in our world. We are suckers for those who help us navigate uncertainty, whether the fortune-teller or the "well-published" (dull) academics or civil servants using phony mathematics.

(The Black Swan, Nassim Nicholas Taleb)

How Not to Bo a Nerd

Think of a bookworm picking up a new language. He will learn, say, Serbo-Croatian or !Kung by reading a grammar book cover to cover, and memorizing the rules. He will have the impression that some higher grammatical authority set the linguistic regulations so that nonlearned ordinary people could subsequently speak the language. In reality, languages grow organically; grammar is something people without anything more exciting to do in their lives codify into a book. While the scholastic-minded will memorize declensions, the a-Platonic nonnerd will acquire, say, Serbo-Croatian by picking up potential girlfriends in bars on the outskirts of Sarajevo, or talking to cabdrivers, then fitting (if needed) grammatical rules to the knowledge he already possesses.

Consider again the central planner. As with language, there is no grammatical authority codifying social and economic events; but try to convince a bureaucrat or social scientist that the world might not want to follow his "scientific" equations. In fact, thinkers of the Austrian school, to which Hayek belonged, used the designations tacit or implicit precisely for that part of knowledge that cannot be written down, but that we should avoid repressing. They made the distinction we saw earlier between "know-how" and "know-what"—the latter being more elusive and more prone to nerdification.

To clarify, Platonic is top-down, formulaic, closed-minded, self-serving, and commoditized; a-Platonic is bottom-up, open-minded, skeptical, and empirical.

(The Black Swan, Nassim Nicholas Taleb)

quarta-feira, 19 de janeiro de 2011

Phony Philanthropy

Frédéric Bastiat was a nineteenth-century humanist of a strange variety, one of those rare independent thinkers—independent to the point of being unknown in his own country, France, since his ideas ran counter to French political orthodoxy (he joins another of my favorite thinkers, Pierre Bayle, in being unknown at home and in his own language). But he has a large number of followers in America.

In his essay "What We See and What We Don't See," Bastiat offered the following idea: we can see what governments do, and therefore sing their praises—but we do not see the alternative. But there is an alternative; it is less obvious and remains unseen.

Recall the confirmation fallacy: governments are great at telling you what they did, but not what they did not do. In fact, they engage in what could be labeled as phony "philanthropy," the activity of helping people in a visible and sensational way without taking into account the unseen cemetery of invisible consequences. Bastiat inspired libertarians by attacking the usual arguments that showed the benefits of governments. But his ideas can be generalized to apply to both the Right and the Left.
Bastiat goes a bit deeper. If both the positive and the negative consequences of an action fell on its author, our learning would be fast. But often an action's positive consequences benefit only its author, since they are visible, while the negative consequences, being invisible, apply to others, with a net cost to society. Consider job-protection measures: you notice those whose jobs are made safe and ascribe social benefits to such protections. You do not notice the effect on those who cannot find a job as a result, since the measure will reduce job openings. In some cases, as with the cancer patients who may be punished by Katrina, the positive consequences of an action will immediately benefit the politicians and phony humanitarians, while the negative ones take a long time to appear — they may never become noticeable. One can even blame the press for directing charitable contributions toward those who may need them the least.

(The Black Swan, Nassim Nicholas Taleb)

terça-feira, 18 de janeiro de 2011

Patterns vs. Randomness

There is another, even deeper reason for our inclination to narrate, and it is not psychological. It has to do with the effect of order on information storage and retrieval in any system, and it's worth explaining here because of what I consider the central problems of probability and information theory.

The first problem is that information is costly to obtain.

The second problem is that information is also costly to store—like real estate in New York. The more orderly, less random, patterned, and narratized a series of words or symbols, the easier it is to store that series in one's mind or jot it down in a book so your grandchildren can read it someday.

Finally, information is costly to manipulate and retrieve.

(...)

Consider a collection of words glued together to constitute a 500-page book. If the words are purely random, picked up from the dictionary in a totally unpredictable way, you will not be able to summarize, transfer, or reduce the dimensions of that book without losing something significant from it. You need 100,000 words to carry the exact message of a random 100,000 words with you on your next trip to Siberia. Now consider the opposite: a book filled with the repetition of the following sentence: "The chairman of [insert here your company name] is a lucky fellow who happened to be in the right place at the right time and claims credit for the company's success, without making a single allowance for luck," running ten times per page for 500 pages. The entire book can be accurately compressed, as I have just done, into 34 words (out of 100,000) ; you could reproduce it with total fidelity out of such a kernel. By finding the pattern, the logic of the series, you no longer need to memorize it all. You just store the pattern. And, as we can see here, a pattern is obviously more compact than raw information. You looked into the book and found a rule. It is along these lines that the great probabilist Andrey Nikolayevich Kolmogorov defined the degree of randomness; it is called "Kolmogorov complexity."

We, members of the human variety of primates, have a hunger for rules because we need to reduce the dimension of matters so they can get into our heads. Or, rather, sadly, so we can squeeze them into our heads. The more random information is, the greater the dimensionality, and thus the more difficult to summarize. The more you summarize, the more order you put in, the less randomness. Hence the same condition that makes us simplify pushes us to think that the world is less random than it actually is.

And the Black Swan is what we leave out of simplification.

(The Black Swan, Nassim Nicholas Taleb)

Perception is biologically bounded

Actually, as I am writing this, there is news of a pending lawsuit by a patient going after his doctor for more than $200,000 — an amount he allegedly lost while gambling. The patient claims that the treatment of his Parkinson's disease caused him to go on wild betting sprees in casinos. It turns out that one of the side effects of L-dopa is that a small but significant minority of patients become compulsive gamblers. Since such gambling is associated with their seeing what they believe to be clear patterns in random numbers, this illustrates the relation between knowledge and randomness. It also shows that some aspects of what we call "knowledge" (and what I call narrative) are an ailment.

Once again, I warn the reader that I am not focusing on dopamine as the reason for our overinterpreting; rather, my point is that there is a physical and neural correlate to such operation and that our minds are largely victims of our physical embodiment. Our minds are like inmates, captive to our biology, unless we manage a cunning escape. It is the lack of our control of such inferences that I am stressing. Tomorrow, someone may discover another chemical or organic basis for our perception of patterns, or counter what I said about the left-brain interpreter by showing the role of a more complex structure; but it would not negate the idea that perception of causation has a biological foundation.

(The Black Swan, Nassim Nicholas Taleb)