next up previous
Next: Consequences for Phonology: What Up: The Discreteness of Previous: Linguistic Phonetics: The Standard

Empirical Arguments for Phonetic Discreteness

There are three main arguments that seem to be proposed most often for why we should believe that the phonetic discreteness assumption and the identifiability assumption is justified. They are that (a) the limited perceptual resolution of humans forces a limit on the number of distinctions possible along any dimension and thus supports discrete categories, (b) the phenomenon of categorical perception is well attested and suggests that discrete categories with sharp perceptual boundaries are intrinsic to speech perception, and (c) evidence for the quantal theory of speech further shows that discreteness is natural for many phonetic categories. I think that none of these provides any compelling evidence that the phonetic space is a universal and invariably discrete inventory.

Limited Perceptual Resolution Argument.

To many phonologists the discreteness of speech perception seems almost to require no defense beyond common sense. After all, so the reasoning goes, one can only distinguish sound differences with a certain level of detail. So it seems that only a certain number of, say, vowel distinctions should be possible due to finite limits on resolution on auditory sensation. But is this reasoning sound? Limits on resolution do not necessarily yield discrete levels. In the late 19th century similar reasoning led early psychologists like Titchener and Wundt to similar conclusions about simple stimulus scales such as pitch, color, and so forth [Boring, 1942].

For example, Titchener and others viewed pitch perception as reflecting a sound unit called the `just noticeable difference' (JND). The idea was that the frequency scale for pure tones, for example, must be divided into discrete steps (just like pixels on a computer screen but in one dimension). Thus, if two tones are presented to a subject serially that are close enough along the sensory scale to fall within the same JND, then it was predicted that they should be reported as the same, but if they lie in different JND regions, then they should be reported as different. This is quite similar to the kind of reasoning that Chomsky and Halle employ in concluding that vowels must have a fixed (small) number of discrete height values, and similar to AMR when he supposes that either Bund and bunt are perceptually the same or they are different.

But psychology abandoned the JND view of pitch resolution long ago. The primary reason is the ubiquity of noise internal to the perceptual system. Thus it is clear that listeners, even for a very simple discrimination task, do not always give the same response when the discrimination is difficult. So if you ask them 10 times about the same pair of stimuli, you will often get some `sames' and some `differents'. If one begins with an impossibly small difference and increases the stimulus difference toward easier discriminations and plots the probability of saying `different' (from 0 to 1) against the stimulus continuum, the data will always sketch out an S-shaped curve - with higher probability of saying `different' corresponding, of course, with larger changes in the stimulus. In fact, if you don't get a smooth curve, then you have either not been sampling closely enough along the stimulus continuum or else have not looked at enough tokens (either within or between listeners).

Can one locate the boundaries between the hypothesized discrete sensory categories? For a small enough difference, moving the difference along the continuum, one should (on the discrete category view) find discrimination flat spots (where the sounds are within the same sensory class) alternating with discrimination bumps (across a boundary between the sensory steps). But they are not found. These days, when a psychophysicist speaks of a `just noticeable difference', it is interpreted to mean enough difference that subjects have, say, a 75% chance of detecting the difference. The S-shaped psychometric function rules out any nonarbitrary steps along stimulus continua. Of course, this is just as true of speech stimuli as for anything else. So the fact that there are sensory limits relevant to distinguishing phonetic categories from each other in no way justifies a claim that there is a discrete set that can be reliably identified.

The notion of `reliable identification' runs into another difficulty as soon as probabilistic judgment appears. It turns out that one must differentiate the `sensory analysis' aspect of the discrimination task (or the identification task) from the `response decision' aspect of the problem. Subjects may have a bias toward one response over the other. For example, if subjects are asked to make a discrimination that is sufficiently difficult that subjects won't always give the same response, then other criteria will play a role in determining which response they choose. These are usually called ``response biases''. For example, if the payoffs and penalties of the situation are such that making a `False Alarm' (calling them `same' when they are not) costs more than the reward for a `Hit' (calling them `same' when they really are), then observers will tend to be conservative about responding `same'. Decisions will be affected by a number of features, including the subject's estimate of the apriori probability of one state of affairs vs. the other (thus, for example, if subjects expect to see more /t/s than /d/s, they they will adjust their response criterion to make sure they respond /t/ more often). So, the actual response of subjects (and therefore their actual percent correct in a discrimination or identification task) depends only in part on the results of their perceptual analysis of the physical stimulus.

This problem is actually fairly easily solved experimentally: the theory of signal detection (Swets, 1961 or tutorial introduction in Kantowicz and Sorkin, 1983) has demonstrated statistical methods to correct for response bias (by taking into account the proportion of hits to false alarms and assuming Gaussian noise distribution) and suggests experimental procedures that permit bias-free measurement of the distinctiveness of two classes.

All of these same features apply to speech. (See Port and O'Dell, 1986 or Port and Crawford, 1989 for simple applications of signal detection theory to German voicing contrast results.) The best general assumption about the perceptual mechanism is that it produces a probability judgment about a stimulus with respect to several possible categories. Which response a listener (including even a phonetician or linguist) actually gives may reflect a variety of factors that have nothing to do with their perceptual similarity to each category.

The Categorical Perception Argument.

It has been known since the 1950's that if you vary speech stimuli along complex continua between phonetic classes, subjects' perception will jump rather discretely (for reviews see Liberman et al, 1967, Harnad, 1987, Repp, 1984). Is this good evidence of a discrete `phoneticizer' in speech perception? No, it is not, for one simple reason. The standard theory of linguistic phonetics requires that all phonetic contrasts be discrete while the categorical perception effect has been known from the earliest days to occur more strongly for certain subclasses of sound contrasts (eg, place of articulation and voicing) than for others, like vowels. In the case of vowels, categorical perception is only obtained under special conditions. Yet discrete categorization of vowels is every bit as critical for vowels as for consonantal features.

Of course, categorical perception is a much more complex problem than distinguishing pure tones differing in frequency. Speech stimuli have enormous complexity and richness, but on the other hand, they receive a huge amount of practice too. It is pretty clear that when we present listeners with very complex stimuli, only certain aspects of the stimulus tend to be heard accurately [Watson, 1987]. Most details cannot be noticed. On the other hand, it is known that if you give subjects a great deal of practice on just a single speech stimulus, listener's can respond in ways that reveal that their auditory resolution approaches the sensory limits observed for simple tones [Kewley-Port et al., 1988].

So categorical perception is still problematic and not understood, but it is clear that it does not provide much justification for assuming that all speech sounds are discretely and reliably perceivable.

The Quantal Theory of Speech Production Argument.

K. N. Stevens (1972, 1989) demonstrated that the acoustic response of the human vocal tract behaves highly nonlinearly for certain changes in articulation. The consequence of these nonlinearities is that for speech sounds at certain locations along articulatory continua, any variation in articulatory accuracy will have minimal consequences on acoustics. The `quantal properties' of speech suggest that certain places of articulation, certain manners of articulation and certain vowels are relatively insensitive acoustically to articulatory variability. Stevens argues that somehow this justifies or rationalizes the postulation of discrete phonetics. But this evidence really only supports the claim that, given some reasonable assumptions about articulatory and auditory preferences, certain speech sounds may be more `attractive' than others. That is, because of these nonlinearities certain sounds may be more efficient choices for languages to employ than others. It explains why certain particular sounds, like [s] and [d] and [a], appear in language after language since these sounds may be both articulatorily and auditorily advantageous. However, it doesn't even begin to provide empirical support for the claim of the standard theory that there is a discrete, reliably identifiable sound inventory innately embedded in human cognition.

In short, none of the empirical, performance-based arguments are directly relevant to the claim that there is a universal, reliably identifiable phonetic alphabet. The fundamental rationale for such an alphabet is really only the original theoretical motivation - that the study of competence phonology cannot even begin without such an alphabet.

Many phonologists would like to believe that experimental research justifies this assumption, but it does not. In fact, data from a century of phonetics research shows that human speech perception is unreliable and nondiscrete. And speech production is the same. It contains enough noise that speakers' productions of the same linguistic units always span some range if careful measurements are made, and speakers clearly have control over continuous variables that permit sounds to exhibit distributions that overlap to any degree - from statistical identity to being obviously very different. There is no reason to believe that all these difficulties are swept away by some `phoneticizer'.

One might suppose that even if all this is true, there is still no reason to suppose that anything important has been lost by the assumption of segmental transcription as the basis for phonology. Why can't phonology proceed just fine without any assumptions about discrete phonetics? Whatever the theoretical niceties, one might hope that perhaps there are only rare practical consequences.


next up previous
Next: Consequences for Phonology: What Up: The Discreteness of Previous: Linguistic Phonetics: The Standard

Robert Port
Mon Mar 3 21:05:28 EST 1997