next up previous
Next: Can phoneticians fail where Up: The Discreteness of Previous: Empirical Arguments for Phonetic

Consequences for Phonology: What is Lost?

I am suggesting a rather sceptical view of the process of data collection that is most often employed in phonology. It seems that this process is roughly that linguists (and sometimes phonetic specialists) produce phonetic transcriptions that supposedly represent in symbolic form all the `linguistically relevant aspects' of speech production and perception. Then phonologists employ these transcriptions as their primary data for generating models of language-specific data structures of the language. Although these days there are many `laboratory phonologists' who combine careful data collection on speech sound and on motor control (for example, Kingston and Beckman, 1990), there remain many phonologists who feel that their work addresses phonological questions for which impressionistic phonetic transcriptions will serve as completely appropriate data. Perhaps. But in my view all phonologists should be at least a little concerned about this assumption.

After all, incomplete neutralization is just one of a vast variety of phenomena on human speech observed experimentally since the second world war. These result show, in broad outline, four conclusions, that

It is these subtleties that have made automatic speech recognition so challenging. Furthermore,

Thus it is very likely that there are a great many aspects of languages that are phonologically important (since they differfrom language to language) yet are completely missed due to reliance on the traditional symbolic phonetic transcription of speech.

Let me mention briefly here two specific examples of phonological phenomena in areas familiar to me that slip through `the segmental grid' - problems that seem to be ignored or dealt with awkwardly in phonology primarily due to reliance on discrete phonetic transcriptions.

The Germanic Postvocalic [Voice] Contrast.

In English when the voicing contrast occurs at the end of a syllable, as in the pair fuzz and fuss, the difference in voicing is manifested as a change in the ratio of the duration of the vocalic part of the syllable to the duration of the final consonantal portion of the syllable [Port, 1981, Port and Dalby, 1982]. This durational ratio also helps to characterize the contrast between, say, bids-bits, camber-camper, Libby-lippy, Bangor-banker, large-larch, ruby-rupee, etc. A similar contrast in the `vowel/consonant durational ratio' for distinguishing voicing pairs is also found in many other Germanic languages (at least Standard German, Bavarian, Swedish and Icelandic). But if you just do a segmental transcription to represent the data, then the durational difference between the stop and fricative closures (eg, between [t] and [d] or [s] and [z]) seems uninteresting (because it is said to affect only `phonetic implementation', not the phonology) and the effect on the preceding vowel and any voiced consonant (eg, the nasal in lunge-lunch) is just another instance of a context-dependent phonological rule, of which there are many well-known examples. The compensatory or inverse durational relationship is completely obscured - due entirely to the restriction to segmental

So here is an important language-specific phenomenon, with many variants across the Germanic family (including the very ``incomplete neutralization'' phenomenon that stimulated AMR's essay). This shortening followed by lengthening (taking [-voice] to be derived from +voice]) could be viewed as a brief perturbation of speaking rate (see Port and Cummins, 1992 for such an interpretation). But however this ratio effect should be described, it is clearly phonology since it is part of the grammar of English, German, Icelandic, etc. Most other languages do not show evidence of manipulation of these temporal ratios as a correlate of a voicing-like contrast. This seems to be, on the face of it, a fascinating phonological problem. But it lies in the time domain and is generally overlooked.

English `Meter'.

A second domain where the description of speech in symbolic terms may render important phonological structures invisible is in the problem of meter. English phrases often seem to have a global timing structure over a scale of a second or so. Phrases like Mississippi legislators seem to have an alternating pattern of stresses - whether 4 of them or only 2 [Hayes, 1995]. It has been proposed many times [Jones, 1932, Abercrombie, 1967, Martin, 1972] that music-like rhythm, definable in terms of relative duration, may underlie such pronunciations. Unfortunately, actual temporal studies typically find messy and unclear results [Lehiste, 1977, van Santen, 1996]. Consequently, phonologists will often address the problem of meter with a discrete time scale, using one time step for every syllable (eg, Halle and Vergnaud, 1980 and Hayes, 1995). But will discrete time prove sufficient for an understanding of English metrics? Certainly it cannot provide a complete understanding, since production and perception always take place in real time.

In recent experiments in my lab, we have been exploring the temporal aspects of English metrics with some new methods [Cummins and Port, 1996a, Robert Port and Gasser, 1996, Cummins and Port, 1996b]. We first made the hypothesis that some sort of real-time oscillatory system might underlie the metrical aspects of speech timing. If this is so, then we should be able to interfere with such an oscillatory system by encouraging `coupling' with another oscillatory pattern (inspired by the work of Kelso and by Treffner and Turvey, 1993). To illustrate what is meant by coupling, imagine a parent pushing their child on a swing. The parent will couple their body motions to the rate and amplitude of the oscillating child-swing system. If the swing length changed or if extra weights were put on the seat, then the parent would adapt to the change in frequency - that is, they will remain coupled to the child-swing oscillator. This state of coupling provides the most efficient way for them to use their body to keep the swing going. Coupling is found both within our bodies (eg, between your left and right leg when walking) and between our own body and that of others (eg, in communal singing or marching).

Oscillators that are coupled tend to impose very severe constraints on each other's frequency and phase. For example, imagine tapping your finger on the table in a comfortable position. If you are asked to tap the index finger on your left hand at some rate, you could do so at any rate over a broad range from fast to slow. But if you are asked to also oscillate your right index finger at some rate, then it turns out that you will only be able to perform both tasks together only at a small set of rates. In fact, they will be rates such that there is a very simple ratio, such as 1:1, 1:2 or 1:3 (or, with some practice, 2:3, 3:4, etc.), between the two fingers. Apparently two fingers in the same body cannot avoid coupling with each other. They `want' to keep certain simple temporal and phase relationships.

So we reasoned that evidence of coupling between a periodic stimulus and human behavior can be interpreted as evidence that the behavioral system incorporates an oscillator. Could we show that the relationship between a metrical foot and the phrase resembles coupled oscillators? In our experiments we asked speakers to listen to a metronome signal (at a comfortable level) and repeat a simple phrase. Thus they might say `Talk to the boy' once for each beep of a metronome (at periods from .3 sec up to 1 sec). This phrase has two metrical feet: `talk to the' , and `boy'. Not only did we find that speakers align `talk' with the metronome pulse (just as we instructed them to do), but the onset of `boy' has a very strong tendency to fall at certain phase angles rather than others, especially at 1/2 (but also at 1/3 or 2/3) of the cycle from `talk' to `talk' cycle. (The reader is encouraged to try simply repeating this phrase. You will probably find that the perceptual beat of `boy' is located half way between the phrase onsets.) So by means of this simple task - repeating a phrase to a metronome - we demonstrated a strong tendency for the `foot oscillator' to entrain itself with the `phrase oscillator' in an integer ratio like 2:1 or 3:1. We take the ease with which speakers couple their speech to a metronome and the tendency for feet to couple with the longer phrase unit to suggest that `hierarchically nested oscillators' running in continuous time underlie the metrical structure of, at least, English - whether or not there happens to be a metronome to couple with. Otherwise this coupling with the metronome should not be so easily obtained. If there is another periodic action - eg, if you are also tapping your finger, pounding your fist, marching, jogging, talking, chewing gum, or whatever - then your speech will tend to couple with it. And there is no way for a metrical phonology based on symbol sequences to explain coupling with realtime periodic events since symbol sequences involve no real time at all.

Of course, none of this implies that traditional metrical phonology, such as the work of Hayes (1995), using discrete time as the basis for meter, is not worthwhile. However, it seems that an empirically adequate understanding of meter will come only when the insights from discrete-time descriptions can be understood or reinterpreted in terms of a dynamical model of meter for English. When that is attempted, I suspect that some current issues will lose their interest (eg, `rhythm-rule' phenomena will be much more clearly understood) while other phenomena will find insightful new interpretations from the dynamical perspective.

I have argued that reliably identifiable and discrete phonetics is an unavoidable assumption for a formal or competence model of phonology and linguistics, but that no such reliable state-based speech perception is possible no matter how many years of phonetic training you have. But there is still one essential task left to do in this essay. This is to offer a specific account of how the phoneticians could miss something that the native speakers could hear fairly easily.


next up previous
Next: Can phoneticians fail where Up: The Discreteness of Previous: Empirical Arguments for Phonetic

Robert Port
Mon Mar 3 21:05:28 EST 1997