Section 1. Formality of Cognition and Phonetics

Oct 1, 1999

Cognition and Computation: Formal Systems for Cognition.


A widespread assumption about general human cognition is that it operates like a formal mathematical or logical system, that is, that it is based on manipulation of discrete symbolic structures and formal operations on those structures.  This is to say that human cognition is one instance of a computational system. What is claimed by this is that there is an inventory of symbolic atoms (or primes) that are available apriori from which more complex data structures are constructed and which are manipulated according to rules during cognitive processing of various kinds. Thus, `thinking' closely parallels theorem proving in formal logic.   Chomsky and later Fodor and Newell and Simon championed such an approach to understanding human cognitive skills involved either language or general problem solving.  Our first order of business here is to clarify just what is implied by this kind of approach.

Chomsky.    Chomsky (1957, 1965) demonstrated that sentences in human language exhibit structure that could not, in principle, be accounted for by a finite-state machine (FSM).  From this he concluded that a more powerful formal mechanism would be required to account for the structure of sentences and phonological systems in human languages.  The empirical issue in linguistics according to Chomsky should then be discovering just what the formal properties of the system would have to be.

A finite state machine is a mathematical system, F ={A, P, S}, where A = {a, b, c,...} an alphabet, and P = {set of production rules} such as ab - abb, or b-ab and S = {initial symbol}.  Chomsky showed than many familiar models of syntax were equivalent to finate-state machines and were thus intrinsically inadequate as formal models of language structure.   In computer science, Chomsky is best known for his hierarchy of grammar types (from context free to regular grammars, Chomsky, 19nn).  This framework helped him define the task of linguistic theory as searching for just the empirically correct level of complexity as well as the empirically correct apriori symbol set and formal operation set.

Formal Systems.   The kind of symbol system that is most relevant here, both for language and for thinking,  is a formal language where there is a vocabulary of both terminal and nonterminal symbolic elements that can be strung together into nested hierarchical structures. Let a simple grammar  S be the quintuple {A, N, C, P, S} where A={set of terminal symbol types, a, b, c...}, N= {set of nonterminal symbols k, l, m,...}, C={concatenation relation} and P={set of production rules of the type {k- aCb, l-bCd...} and S is the initial symbol (eg, in S- kCl }. A wellformed grammatical construction must contain only terminal strings. Its construction  begins with the initial symbol S. Rules of the system control the derivation from S to one of the  legal strings of terminal symbols.

This form of statement makes clear that the set of terminal and nonterminal symbols is provided in advance in the case of a mathematical system.  There is no way to get the required symbols from within the system; they must be provided in advance or externally.  Obviously, there is a question about how to interpret `in advance' for the case of a biological system (see Bates and Elman, 199n, Pisoni-Aslin, 198n). Chomsky and most others (Pinker, 199n) have interpreted it to mean `innate', that is, supplied at birth rather like other instinctual behavioral traits.

Then there is a set of rules that make explicit how combinations may  be achieved. All other units (that is, composite expressions, eg, data structures, trees, words, etc.)  of the formal system are constructed  from the set of vocabulary items provided in advance. Even for much more complex systems this constraint still applies: new symbols may not be constructed from whole cloth, but can only be constructed from the inventory that is provided apriori. It is assumed that the system holds together in just this way.  This amounts to an empirical claim that must be substantiated.   This assumption is the rationale for saying, as most linguists currently do,  that the goal of linguistic theory is to discover the innate symbolic atoms and the formal constraints on grammars.  Furthermore it is the innate ability to differentiate the specific identity of each member of the apriori symbol set that guarantees success by speaker/hearers at determining which rules can apply at any particular point in time during language use.

Explanatory Aims: Realism and Phenomenalism. There is some question as to the nature of the proposed relationship between formal systems and actual human language and cognition. There are, it seems, two possibilities:

  1. Formal linguistics is supposed to describe abstract entities, formal grammars. These formal grammars provide a purely descriptive account of our speaking abilities, idealized from any actual speaking. Theories such as this are often called phenomenological because they aim only to describe phenomena without providing any mechanism to explain how they occur.
  2. Formal linguistics, and formal grammars, provide the basis for cognitive psychology. Formal grammars are taken to be implemented in human brains and account for our actual abilities to think and learn language. Approaches such as this are often called realist because they aim to provide real mechanisms for the observed phenomena.

The question is which of these two ways ois formal linguistics intended. Is Chomsky's linguistic theory a theory of actual cognitive behavior? No, Chomsky in many writings (eg, 1965,pp...) claimed that his generative grammars capture only `what speakers in some sense know'' about their language. A generative grammar was not to be seen as a mechanism for either sentence production or perception, but rather as an explicit (and finite) statement of (that is, a specification of form of) the full (and infinite) set of acceptible sentences in a language.   Taken only this way, there is a real question about whether such a specification would be of any scientific interest. It only would be interesting if it happened that the method of specification revealed something substantive about real cognitive systems for producing and perceiving utterances. Of course, Chomsky, in other places, has insisted that his grammars do have some kind of psychological reality - that is, that a generative grammar does exist in the brain of human speakers (see his Appendix to Lenneberg, 197n).    Apparently it was to be something that a speaker makes reference to while constructing sentences (using some real time mechanisms).  Certainly many other linguists, psychologists and philosophers  interpret his theory this way.

Of course, there are some scientists who would prefer to back away from  such a realist interpretation of the hypothesis, and say that the symbolic cognition hypothesis describes only what the nervous system `in some sense does' (Chomsky, Massaro), that it only offers a convenient description of phenomenal properties of overall behavior, rather than offering explicit explanatory claims about physical implementation in time (cf, Fodor-recmd-Tony ). Since such an approach has very weak ambitions, we will have nothing to say to such phenomenal theorists. By making claims that are only very weak and vague about the implementational consequences (about the spactio-temporal aspects of performance), these theories may attempt to protect themselves from the phenomena.  But such a theory has little to recommend it. We prefer to explore realist theories that do not shirk from making implementational claims. Only these realist theories make explicit predictions that can be tested empirically. From now on, then, we will concern ourselves with The Physical Symbol System Hypothesis proposed by Newell and Simon and the Language of Thought Hypothesis proposed by Fodor. Both these theories are explicitly realist.

Newell and Simon.  Newell and Simon (1975)  proposed formal symbolic systems as models of aspects of human cognition as well, but they were much more explict that the formal models  should be seen as simulations of cognitive function in real time. In fact, they went so far as to propose that physically implemented formal systems are the only known basis for intelligent behavior. They characterize a physical symbol system, as :

``a set of entities called symbols which are physical patterns that can occur as components of another type of entity called an expression... Thus a symbol structure is composed of a number of instances (or tokens) of symbols related in some physical way (such as one token being next to another)....Besides these structures, the system also contains a collection of processes that operate on expressions to produce other expressions: [by means of ] processes of creation, modification, reproduction, and destruction. A physical symbol system is a machine that produces through time an evolving collection of symbol structures.''

The physical symbol system hypothesis says that physical symbol systems have ``the necessary and sufficient means for general intelligent action.'' In particular, Newell and Simon claim that human general cognition is an instance of such a system.

What is a Physical Symbol System?  Of course, the formal system architypes discussed above are  mathematical constructions and `live', exist,  in what philosophers sometimes call with a smile `Plato's Heaven'.  It is  the abstract space of mathematical and logical symbol systems - a world where logic is the basic guide and the only time is serial order. But Newell and Simon (as well as Fodor and Pylyshyn) were quite specific in speaking of a physical symbol system: a formal system which happens to be physically implemented  as a mechanical or electrical construction rather than only conceptually implemented by the logical  imagination of a mathematician (perhaps using pencil and paper as a supplementary  symbolic helper).

The physicality of physical symbol systems is very important since it implies that, while the system may obey the constraints expressible in the formal description of the system, they also obey the laws of physics in real time. This is due either to careful engineering (in the case of computers and vending machines) or, presumably, to natural selection in the case of human psychological systems (although an alternative account invoking divine engineering has widespread support). In the case of a modern digital computer, the pattern of 0s and 1s at time T=n has a causal effect on the pattern of  0s and  1s  at time T=n+1. The set of possible states of the system is digital and, of course, time is digital as well, since a discrete time clock controls the state changes in an orderly manner relative to the continuous time of the physical world (lately, at several hundred million clock ticks persecond on my desktop).

During functioning by a physical symbol system, then, the symbol structures have distinct physical forms (such as p and q or 0 and 1 ) at time T. These different forms have distinct causal effects on the control system according to digitally specified rules such that  coherent and correct symbol strings are (deterministically) generated. Since the psychological or cognitive claim concerns performance of the human body, we should expect that specific claims about the formal aspects of human cognition will have some kind of physical effects that could, in principle, be directly observed and tested.

Physical Implementation and Digitality. The question we are now ready to approach is: what is the physical implementation of actual symbolic units in a human cognitive system? Few have even speculated on the issue with respect to human cognition (cf, Scheutz, 1999, Dietrich, 1990).  Presumably physical instantiations of the symbols should exist in every case where symbolic theories claim that symbols provide an account of human performance. It is quite clear that any symbolic units must resemble, in certain critical respects, the bit strings employed in digital computers. Specifically, they would have to involve an explicit set of digital types as defined by Haugeland (1985, 1997) and should be produceable and identifiable essentially perfectly. All formal systems depend on the existence of levels at which discrete identification of types is trivially easy and unproblematic.  Such models assume digitality at all levels from the bottom to the top.

The reason for this very strong constraint is clear: Symbolic or computational systems depend on a control structure that is always able to identify which symbol it is ``looking at'' at each discrete time step (since only that identification makes it possible to determine which rule to apply). For example, in statements of first-order logic, one must be able to differentiate p from q without fail. A computer must always be able to discriminate 010 from 011. Whatever alphabet is employed, the control system (or implementational system) must `write' (or make a mark or physically move before the next time step) any token and also `read' (or perceive or detect the current position of) any token without possibility of error, that is, `positively' (Haugeland, 1985, pp. 53-55). Only with this property is it feasible to build symbolic systems of great complexity and have confidence that the system is doing  what it is supposed to do. Even very tiny errors or uncertainties in, say, a numerical value or symbol identity would lead to nonsense behavior in any system of the complexity required to model even the very simplest aspects of human cognition. All symbolic systems must be digital in this sense or they will fail to work at all. Theoretically, they will fail to implement the notion of formal symbol system (and thus cease to be physical symbol systems).  Practically speaking, any missed bit will result in system failure within just a few steps of operation.

Does this mean that the physical substrate of any physical symbol system must itself be discretely structured? It does not. As Haugeland puts it (p. 55),

in the real world there is always a little variation and error. A ball never goes through the basket the same way twice; the angle of a switch is microscopically different each time it's set; no two inscriptions of the letter A are quite alike.... But digital systems (sometimes) achieve perfection, despite the world... Essentially, they allow a certain ``margin of error'' within which all performances are equivalent and success is total. Thus the exact token doesn't matter, as long as it stays ``within tolerances.''

Digital computers provide a good illustration. The physics of computer chips is actually continuous, of course. Differential equations are what is used to describe the motion of electrons through the chip. However chips are designed to exhibit a system of discrete attractors at values we call 0 and 1 such that all states of the system that are not near an attractor are highly unstable. The system falls instantaneously -- that is to say, ``very quickly relative to the time scale of the system clock'' -- into one or the other value, 0 or 1. In this way, computer symbols (from {0,1} to ``scanf'' to ``Edit:Paste'') can be treated by computer programmers and users as if they are digital in Haugeland's sense. Of course, they are digital -- but only when looked at on the appropriate discrete-time scale as governed by the clock on the physical chip.  The question is whether there is any reason apriori to either accept or reject the possibility of some similar behavior during human cognition at some appropriate level?  The computational theory of mind has taken a bet that the answer is Yes.  And the bet is not just that one such level can be found, but that, in fact, all cognitive levels of cognition will be found to exhibit just this kind of behavior.  That is just what it means to be a multilevel formal system.

Computer bits happen to be binary, though general cognitive and linguistic units, of course, need not be. For human cognition conceptualized as a case of symbolic behavior, we have a responsibility to ask what the actual symbols -- both minimal atomic ones and more complex molecular ones -- are like. If humans do implement a physical symbol system, as proposed by the standard theory then there must be actual physical symbols - ones that are digital ``within tolerances'' - somewhere in the brain. Newell and Simon (1975) and Fodor and Pylyshyn (1988) (also Pylyshyn, 1979, Fodor, 1976) do not shrink from making such an assumption explicit. They say, for example:

Because classical mental representations have a combinatorial structure, it is possible for classical mental operations to apply to them by reference to their form. We take these claims quite literally. They constraint the physical realizations of symbol structures. In particular, the symbol structures in the brain and the combinatorial structure of a representation is supposed to have a counterpart in structural relations among the physical properties of the brain (Fodor and Pylyshyn, 1988, p. xx)

Constituent Structure. Another important property of formal symbolic or computational systems is constituent structure. Individual tokens must be combinable with complete generality. So let's elaborate our symbolic system to include both a set of names a, b, c, ... and a set of single-argument predicates F, G, H ... . This system should have the property that it can state F(a), F(b), F(c), ... G(a), G(b), G(c),... H(a), H(b),..... And it must be that F and G etc. represent the same predication in all cases, and the names a, b, c... must also refer to the same individual in each case (Evans, 1982, pp. 100-105). To use a linguistic example, a language containing run, walk, sleep as well as children, women, boys, should have children run, children sleep, boys run, boys sleep, and all the rest of the combinations (unless there are explicit rules preventing certain combinations). Or, taking a linguistic example from another domain, the [+voice] in the segment called [b] must be the same [+voice] unit in the segment called [d] or [a] or [r]. This generality or systematicity gives formal systems much of their representational power (Fodor & Pylyshyn-88; Smith, 1996).

 Note that this property exploits combinatory power of an alphabet of distinctions, but also that it depends on the digitality of symbol tokens. It depends on the uniform possibility of recognizing and generating symbol structures infallibly.  Atomic types must recur; the a in F(a) must be the same a as the one in G(a). The control system must always be able to tell when the type a or b or F or G has occurred. There can be no fuzziness or ``almost-the-same kind'' regarding atoms if the system is to be constituently structured.

Digitality All the Way Down. There is one final property to mention that is implicit in what has been said so far but still merits comment: Complex formal systems frequently have a number of distinct levels, with each level defined by its own distinct vocabulary of symbol types. These levels are familiar from linguistics, where there are at least a phonetic level, a phonological level, a lexical level, phrase level and so on. Typical symbolic atoms on each level include respectively: [+/- aspirated], [+/- Tense], `table' and `Noun Phrase'. Programming languages typically have distinct levels employing digital tokens that range from raw bits, to machine-language symbols to, say, a `hotlinked' word in a web document. It is important that every one of these levels must meet the criteria for digitality and constituent structure. Unless all are digital, the system will break and formality will fail to be true of the system.

Of course, whenever we talk about a physical symbol system, the symbol tokens must have a unique physical form -- some spatio-temporal pattern of physical material -- for each physical prime. For computers this means that 0s and 1s at the set of integer time points must be digital and easily differentiable. (It does not matter what state they are in at other time points between clock ticks.) And any other kinds of cognitive units will need to be fully as digital at their appropriate times as the 0-1 patterns are at the clock ticks of a computer.

 

What Human Cognitive Symbols are Like.

The next question to ask is where should we look for direct empirical evidence of the digitality of any of these symbols? Where should one look to find further physical evidence regarding the appropriateness of the Physical Symbol System Hypothesis for human cognition?

Since we, like Chomsky (1976), Fodor (1979) and others are adopting a realist stance, we will inquire into further details of these symbols as implemented. Can we expect to find physically instantiated human cognitive symbols (Cf Chomsky, 1976, pp. xx)? Of course, there are many practical difficulties standing in the way of actually finding symbols in a neurophysiological system. The most important problems seem to be these two:

Problem 1. The problem of knowing what symbols and operations are actually used when people think some thought or engage in a particular linguistic operation (Chomsky's basic question), and
Problem 2. The problem of figuring out just where and when to look for specific symbols in a functioning nervous system (a cognitive neuroscience question).

 

Problem 1: Which symbols are actually used? If people think symbolically, what symbols do they employ? The most direct attempt to answer this question was provided by Fodor's  `Language of Thought' (1976), in which a very specific version of the physical Symbol System Hypothesis is outlined.. Fodor pointed out that some kind of rich representational scheme is clearly required for commonplace cognitive behaviors.  For example, consider such everyday cognitive activities  as:

Such thoughts, which humans think all the time, require a representational scheme of great richness. This scheme may be distinct from language itself, but must be very similar to linguistic constructs. Most importantly, though, a rich representational scheme is required to explain interpersonal communication. When two people communicate in language, one of them thinks a thought, then translates it into a natural language sentence, then into a sequence of motor commands, and speaks it. To understand that sentence, the hearer of that sentence translates what she has heard into her own language of thought For such translation to work, the language of thought, sometimes called mentalese, must have lexical entries that correspond to at least most of the lexical entries of one's language (like table, door, Dave) and much of the logical apparatus (like if-then, X or Y, before-during-after, Attribute-Noun, etc.) that is provided by any human's native language. It is also reasonable to assume, as Fodor does, that such a representational scheme must be symbolic and digital just like linguistic units (that is, just like words).

Fodor's suggestion for Problem 1, then, would be that the symbols of human reason are very similar to the words and other linguistic units people employ when they talk. There may be other kinds of symbols used in thinking but not language, or some used in language but not thinking.   This seems very plausible because all humans normally speak at least one language, and because languages themselves seem, prima facie, to exhibit at least symbol-like word units. The hypothesis is surely reasonable, but it only addresses the first problem by proposing that cognitive symbols are a lot like words in natural language.

Before discussing Problem 2 about the physical implementation of symbols, we must first discuss language in general, and especially the way language is conceived by symbollically-oriented cognitive scientists.   As we will see, the discussion so far implies some very strong constraints on the nature of the cognitive architecture and, especially, the phonetic space. In fact, these implications have been quite clear to generative phonologists and are taken for granted. It is only phoneticians (who are generally reluctant to speculate about the formality of language) who have not accepted these implications (Ladefoged, Fry, Heffner, etc).

 

Language and Symbolic Structure  

At first glance, language presents itself as the architypal example of a discrete symbolic system. ``A word is the quintessential symbol'' according to Pinker (p. 151). Indeed, logic was invented originally as a method for assuring soundly constructed patterns of sentences in an argument. Modern programming languages resemble natural languages in many respects aside from the name. Philosophers of language down through the ages have always seen language as essentially constructed from discrete signs and symbols. And professional linguists at the close of the century remain almost universally committed to the explicit assumption that:

Language simply is a formal system.

But viewing language as a formal system has many consequences. In this section, we will outline some of them, focussing particularly on consequences for phonetics.

Interpreted Formal Systems.  In more complex symbols systems such as language,  particular nonterminal symbols may have some semantic content. For example, the symbol dog in English `refers to' (or is about) the class of realworld dogs.  And more abstract symbolic types, eg,  Noun may refer to such abstract classes as`the name of a person, place or thing'.  This semantic property goes somewhat beyond the notion of a formal system; each symbol (or at least most symbols) in a real-world thinking system must be `about' the world.  They are not contentless ciphers but meaningful symbols.  Haugeland calls such formal systems `Interpreted Formal Systems.'   It is this property that makes it possible, through appropriate symbol manipulation by the control system to represent the world in a way that is suffciently veridical to be useful.  Clearly, Chomsky's formal grammars must be interpreted formal systems since the sentences people construct tend to provide plausible and useful descriptions of the world.

However, a critical aspect of interpreted formal systems is that whether a particular rule can apply is constrained only by whether the symbol matches a physical template for the symbol type and never on what the semantic content  of a symbol is. That is the whole point. Rational linguistic (or other) constructs are produced, but it is the rules that assure that coherent semantics will result.  And the application of the rules depends on the form of symbols, not on what they mean. This is why such systems are so powerful. ``Take care of the syntax'', Haugeland says, ``and the semantics will take care of itself' (AIVI, p. nn).'  One  never needs to know anything about the semantic associations of a symbol to determine whether a rule applies. Only the identity of the token as being of some specific type is relevant to determining what will happen next (Fodor and Pylyshyn, 1988).  But what allows  these tokens to take on specific identities to support rule application?  Very simply, it is their distinctness -- a distinctiveness that is simply  assumed to be easy -- so easy as to be either `already solved' or else perhaps `not in need of a solution since it is so obvious'.   Thus specific words are distinct from each other, NP is distinct from VP, [+voice] is distinct from [--voice],  batter is distinct from better, and so on.

Multiple Levels, Multiple Alphabets. All natural languages are generally believed to consist of a set of hierarchic levels (including, eg, phonetics, phonology, lexicon and syntax) each of which has its own discrete alphabet of symbols (eg, [aspirated], [+/- Tense], the word ``Table,'' or Noun and Verb) (Chomsky, 1956/1975, 1968). And, indeed, if as discussed above, natural language is to be used for communication of thoughts from one person to another, cognitive the cognitive system must come in multiple layers corresponding to those found in language. Fodor (1976) argues that public language will constrain our theory of the mind via constraints it puts on a theory of messages, things that are communicated from one person to another. 

"The general idea is that facts about natural languages will constrain our theories of communication, and theories of communication will in turn constrain our theories about internal representations. [...] In particular, I want to show that there are a variety of different kinds of conditions that an adequate theory of messages would have to satisfy, and that this is to the point because messages are most plausibly construed as formulae in the language of thought." (1976, p.109)

Fodor makes two general claims along these lines. First he says that there is an indirect mapping from messages to acoustic wave forms and vice versa; this mapping is indirect in that wave forms and messages are paired via the computation of a number of intervening representations. He also claims that mong the intervening representations, several correspond to structural descriptions provided by Chomsky's generative grammars. (pp.109-110)

As we noted above, Fodor is a realist, not a phenomenalist: these two general claims imply that structural descriptions of generative grammars are psychologically real. So when we understand a spoken sentence, we compute the corresponding message from the wave form, by computing transformations of it from the phonetic level to the morphological level, and then to the surface syntactic level, and then to the deep syntactic level, and then to the language of thought. (Fodor suggests that these last two levels may be the same). Each of these levels, Fodor claims, "can be identified with a certain (typically infinite) set of formulae whose elements are drawn from the vocabulary of the level and whose syntax is determined by the well-formedness rules of the level." (p.110) In other words, these other levels are also formal. And, as we saw above, if those levels are formal, they are also digtal--the symbols at every level must be positively readable and writeable.

Note that the evidence that thought occurs in multiple layers just like language comes from the theory of messages, which are things we communicate. Messages are taken to be formal because we communicate them with natural languages, which Fodor simply assumes to be formal (this is, of course, a common assumption). For natural languages to be formal, each of (at least) the deep syntactic, the surface syntactic, the morphological and the phonetic levels must be formal. And the evidence that cognition is manipulations of formulae in mentalese is that natural language is manipulations of formulae through several levels of processing. Thus evidence that natural language is not formal in this way undermines Language of Thought hypothesis.

Unfortunately for a maximally simple version of a computational theory, the units on the various levels of linguistic structure are quite different from language to language.
Clearly there is no fixed inventory of words that might be assigned different meanings in different languages. And when people try to pronounce words in a language they do not speak, they tend to mispronounce it severely.  Now if you pick abstract units like Sentence, Noun Phrase, Predicate, Prepositional Phrase, one might argue for sufficient similarity to claim they are the same across languages.  But the fact that the phonological structures for words and phrases are wildly different across languages is a major difficulty. These differences can be very subtle and can reveal what specific dialect region we come from, and so forth. Most of us can speak only one or two  languages with skills that are like those of a ``native speaker''.

If one believes the classical theory of linguistic phonetics, then all of these differences are a problem.  They must be capturable with the universal phonetic alphabet, since all control of speech by a language is to be restricted to this apriori alphabet.  There must be a commensurate base set of phonetic symbols, the set of phonetic primes. If one wants to study the abstract symbol structure of the phonology of specific languages, one must assume that this apriori base set exists (it may not seem so important to know just exactly what is in this set, but only that such a set exists.).

Symbolic Phonetics. Like the other levels of lexicon, syntax, etc, the phonetic level has its own universal alphabet, serving as the terminal alphabet for all human languages. It is the final discrete structure that leaves the domain of abstract formal systems and enters the physical world for transmission between individuals. According to the standard generative view, the speaker codes their semantic message into a string of phonetic symbols vectors for transmission to the listener. The listener interprets the ordered discrete vectors from sound then reconstructs the higher-level symbolic structures of the utterance just  from the phonetic symbol string.

Chomsky and Halle are quite clear that the phonetic alphabet is apriori and innate. They propose linguistic universals that must be assumed to be ``available to the child learning a language as an apriori, innate endowment.'' These include ``a theory of universal phonetics that specifies the class of possible phonetic representations by determining the universal set of phonetic features and conditions on their possible combinations (p. 4).'' ``The total set of features is identical with the set of phonetic properties that can, in principle, be controlled in speech. They represent the phonetic capabilities of man and, we would assume, are therefore the same for all languages (p. 295. Cf. Chomsky, 1972, pp. 121-123).''

In a well-respected graduate-level textbook of phonology, Kenstowicz and Kisseberth (1979, pp. 7-9) state an essentially identical point of view, emphasizing both the discreteness imposed on phonetics by the  perceptual-motor system and the assumption that all languages can be fully transcribed with a single inventory of phonetic symbols.

``From a purely physical point of view, any utterance is a continuum, acoustically and articulatorily....Nevertheless, speech is perceived and functions linguistically as a series of discrete units called sounds. The general goal of linguistic phonetics is to describe accurately (both acoustically and articulatorily) all the various kinds of speech sounds that function in the languages of the world. When this goal has been attained, it will be possible to develop a universal system of notation so that any utterance in any language can be transcribed. And then, on the basis of the transcription, the utterance can be spoken to give a faithful rendering of all its ``linguistically significant aspects''.

This is a bold empirical claim.  But it is much more than just another empirical claim that one might be right or wrong about: If one is to take seriously the `formal character' of language (as linguists generally do), then this simply must be true. The digitality of phonemes, is a necessary condition for the formality of thought, one which follows directly from the idea that language and the language of thought are formal systems. Suppose one were to argue otherwise (this will be a tempting strategy to some, given the evidence we cite in the next section.) One might, for example, argue that digitality  stops with words, or morphemes. Thus, the morphemes and/or phonemes might not be digital. But for this to be true, the digital internal language must be completely cut off from public, natural language. That is, assuming that words or morphemes are the lowest level which is digital makes it impossible that phonemes are connected to mentalese. But this is unacceptable, because then there would be no evidence for the formal theories of thought. If phonemes, that is spoken sounds, are not connected to the language of thought, why would we think that language constrains thought at all? And if thought and spoken language are disconnected in this way, the idea that thought is formal is not connected to observable phenomena at all. So it is unsurprising that proponents of formalism linguistics and cognitive science agree that phonemes are digital. As Pinker puts it: "But the phonological module of the language instinct has to do more than spell out the morphemes. The rules of language are discrete combinatorial systems: phonemes snap cleanly into morphemes, morphemes into words, words into phrases." (1994, p.163)

Not only must the phonetic level be digital, the individual phonetic elements must be static, in that they must be specified at particular temporal points. This must be the case because to be computed over, in translating up to the morphological level or down into speech sounds, rules must be applied to the phonetic elements. Such a rule going from phonetic elements to speech sounds might be:

If +sonorant and -consonantal and -vocalic, then voiced. (is this right Bob?)

To apply a rule like this one to this set of phonetic elements, the phonetic elements must be read at some particular moment of time, and at that moment of time their values must be perfectly definite. That is, they must be static.

The similarity of this view with reading and writing of bits in a computer is quite clear: just as a computer interprets a continuously variable voltage signal (on a serial input line) as a discrete sequence of 1s and 0s (the only possible transmittable units), human listeners must be assumed to interpret continuous speech signals as a discrete series of segments drawn from a fixed alphabet, and the alphabet is assumed to be supplied at birth to all members of our species.

Observing the Physical Implementation of Symbols.  Returning to the question of finding the physical implementation of the symbolic units in language and thought, where should we begin? Could we look for the physical instantiation of abstract syntactic units like Subject, Noun, Past, and so on? This  would appear to be quite hopeless given present technology. Do individual words have a discrete physical implementation -- something resembling a unique physical bitstring form? It is difficult to tell at this point but probably not (refs). As for syntactic units, it is much less likely that some physically definable events standing in a one-one relationship with the symbols of our technical linguistic descriptions of sentences could be found in the brain. Of course, the theory of physical symbol systems must insist that such physical correspondents of the symbols in our scientific descriptions do exist.  Even Chomsky agrees that in principle, physical correlates should be observable (1972).   The theory claims they exist as stable, static discrete structures but, of course, it does not claim that there must be any straightforward way to locate them either in space or time in the brain. This would appear to make physiological verification of the symbolic/computational theory of cognition and language all but impossible.   If this sad state of affairs really existed, it would  be embarrassing because it would appear that the theory could not be tested with direct physical methods.

But  we need not yet give up.  We should look a little further. One kind of  alphabet hypothesized for the multilevel structure of linguistic utterances is the phonetic level. Like other symbols, phonetic units are apriori (relative to the specifics of the theory itself), static (since units must be definable or specifiable at some discrete time t=n) and digital (that is, consisting of discrete tokens that are infallibly writable and readable).  But, unlike the other cognitive symbols, phonetic symbols have additional properties relevant to their role as communicative signals.  They must be physically instantiated in the medium lying between individuals in order to support communication with language.

Importance of Phonetics for Testing Symbols.   At the level of the phonetic symbol string, we find a unique opportunity to test the proposals above about the discrete, digital nature of cognitive symbol types. The phonetic symbols, unlike the alphabets used at the other levels, have additional severe constraints that must apply to assure that interpersonal communication is possible with language. The symbols of this level, in addition to serving as cognition-internal (that is, as competence-level) symbols in the data-structures corresponding to any utterance, must also have distinct, communication-supporting mappings into sound. That is,  all the properties that make symbols work so well for internal communication will serve well for interpersonal communication too.  Obviously these apriori universal units will be basic units for both language perception and production. Apparently, phonetic representations of speech (such as are presented over a telephone line or preserved on a tape recording) are sufficient to specify spoken utterances in any language (except for sign languages like ASL, of course).   Phonetic symbols are produceable reliably and perceivable reliably.

So, of all the symbolic levels involved in general thinking and in linguistic communication, the phonetic level is the place where it would seem we are most likely to have success at finding the reliable, discrete, static physical correlates - units that are as distinct as different bitstrings in your PC - which the theory insists must exist for all symbolic levels of both cognition and language. If we are to find physically discrete units for any symbols, we should find them in the univeral phonetic alphabet.

If they exist here as digital objects at an appropriate set of time points, then this would offer support of immeasurable value for the entire symbolic-computational theory. If speech sounds can be shown to be based on some reasonably small alphabet employed by speakers of all languages and available at (or very close to) birth, then the whole Chomskyan proposal would have critical experimental support of a very concrete sort.  If this level is discrete and digital, then we could rest easier in assuming that other levels, like the phonology and the morpheme lexicon, are digital as well, since lexical and phonological units will inherit their discreteness from phonetics (as long as phonological and morphological units have specifications within phonetics).  Then we could begin to rest easy about strong claims of the Physical Symbol System Hypothesis and even about the Language of Thought.

On the other hand, if little or no evidence can be found for such a universal, discrete and static units in the only place in the entire cognitive where, for the time being, we should have a good chance of directly observing them, then, at the very least, the whole hypothesis should be suspect.  But in fact, the consequences are far worse that this, as we shall show.  But first let us review the argument made thus far.
 

Conclusions.

Here are the conclusions so far in schematic form:

 

Implications for Phonetic Units

If this theory is to turn out to be true, then we must expect to find a theory of phonetics that will provide  certain things.  Only in this way would the symbolic hypothesis about cognition and language be supported. It seems that all of the following properties must be true:

  1. Phonetic atoms are digital. That is, the atoms are so easily differentiable from each other that they will be infallibly identified. It follows that any speech signal should always have a unique transcription in this alphabet.
  2. Phonetic atoms are static. They require atoms that can be defined or specified  (to distinguish them from each other) at some specific point in discrete time. Since symbolic time is only  discrete, intermediate temporal locations (in real, continuous time) are ``invisible'', so all business  must be conducted (as it were) at discrete times. It is not clear how those times might be chosen in a nervous system, but it seems they could be either at regularly spaced intervals or at irregular ones (as long as the times are chosen apriori somehow).  Since symbolic models have nothing to say about continuous time, they can only make claims about discrete times chosen some other way.
  3. The set of phonetic atoms is invariant across all human speakers.  The sum of all elements in the space is invariant in size and content over the generations and over the species. No new sound components can be invented (since they must always have been there). New sounds could only appear as novel combinations of universal  features.
  4. The alphabet size is small. The space should have few enough distinctions so as to make transcriptions in the alphabet useful for a first-language learner to encode ambient speech. (So, it won't do to have an alphabet with millions of symbols, or simply sum up all the sounds in all languages into a maxi list.)
  5. The alphabet is available at birth (or conceivably spontaneously generated shortly afterwards) and relatively unchangeable over time. Speech perception skills - the basic ones at least - should not require any tutorial period whatever. They should be  innate and uniform across the species. Although deterioration of unused distinctions during maturation (eg, Werker-Teas, 198n, Strange, 1997?) would not seem to undermine the theory particularly.

Very simply, the traditional theory requires that phonetics actually be symbolic. It is not enough that the (more abstract) phonological level be discrete and symbolic. According to the classical theory of formal language, phonology must inherit its discreteness from a discrete universal, apriori phonetics.
 

Implications for the Theory of Cognition.

We discussed what one should find in phonetics if the computational theory of cognition is correct. But what if these claims fall through?  What if no universal inventory of digital phonetic types can be found?  What if the phonetic space appears to be infinite in size with no limits to possible new entities, and contains elements not definable at any discrete time point? And what if it has no apriori, discrete units at all?  Would this be a disaster for the traditional view?

At first glance, perhaps it would require some serious revision, but would not be a disaster. Certainly, one implication would be that this place, where concrete empirical support might have been expected -- where one would have hoped to find a simple universal inventory, -- will have failed.   At the least, one would be forced to back away from the claim of Chomsky (and Chomsky and Halle) that phonetics provides the bedrock terminal symbol inventory for language.   That would be too bad,  but one might instead propose that what the (hypothesized) nondiscreteness of phonetics shows is only that the basic symbolic atoms of language and cognition are provided at Some Other Level than phonetics.  Perhaps there is Some Other Level that will provide the digital property needed to make the theory defensible.  Perhaps we just haven't looked deeply enough yet. One might hope that there really is a small inventory of discrete, digitally specifiable units. Maybe we would just have to look a little further.

There are several places one might look. One obvious option is that morphemes or words provide such discrete units. After all, words, as noted above,  do appear to be discretely different (as in beat, bit, bet, bat, mead, mid, med, mad, etc.) as they are used by skilled, adult speakers. But obviously  words and morphemes are not universal. So they cannot provide the apriori inventory that the theory insists is what accounts for language acquisition and reliable formal functioning.  Furthermore, as we noted above, finding the discrete units only at the level of morphemes or words cuts linguistic and cognitive theory off from spoken language. Another option would be phonemes. These also appear to be discrete -- at least, if you look at experienced speakers of any language.  However, they too are obviously not universal or even commensurable across languages so they cannot play the critical role that is required.

Are there any other options?  It is conceivable that there is some level of phonetic description, some way of chopping up speech sounds, that will prove to be universal as well as digital.  Perhaps we should not give up too soon and if we just keep looking, a set of features or segments that will make the symbolic theory of cognition work will be discovered.  K. N. Stevens, for example, appears to be committed to such a view  (eg, Stevens, 19nn).   This is an appealing inference to draw.  But there are major problems if one takes this route.

Problems with the Some-Other-Level Gambit.  First, since sound structures are the basic units of the linguistic (and cognitive) hierarchy, then, if phonetic units are not universal, digital and symbolic,  how can morphemes still be symbolic?  How could morphemes serve as digitally reliable names for concepts or be digitally distinct if phonetics and phonology are not physically digital?  According to the symbolic cognition framework, morphemes and words, the basic computational data structures of language and cognition, depend critically on discrete phonology (morphemes are discrete symbolic identities because they are spelled from some specific sequence of phonological units) , and phonology depends on a discrete phonetics (since each phonological unit is specified by some characteristic vector of discrete phonetic features or segments) .   If phonetic distinctions are not discrete, then how will the formal computational mechanism of cognition and language tell two repetitions of a single utterance from two different utterances?  To imagine such a situation is to claim that a computer will work if even if it differentiates almost-a-1 from a 1 or if a logic system distinguishes a p written with my left hand from a p written with my right hand.    But as shown above, this is not a possible state of affairs for any genuine formal or computational system. A 1 is a 1 and a p is a p. Fuzziness is ruled out by the very definition of formal, symbolic or computational system. The utter absence of fuzziness is what allows all the assumptions about formal models to be imported into theories about language structure and about human cognition.

If phonetics should fail to live up to its symbolic constraints,  then the mathematical properties of a formal system could no longer technically apply to language anywhere!  The morpheme table looks discrete on this page, but that is because it was written using a discrete (formal) alphabet on a digital computer. But if  the phonetics used to actually pronounce words is not discrete, then pronunciations of the word table (and thoughts about a table) simply could not be discrete -- unless you simply deny that the word table  (or a thought about a table) is something that has spatio-temporal existence. One could say that the word table really exists as a formal object but only in Plato's Heaven (where it is specified using a platonically digital alphabet).  Fine. But then one has abandoned all claim to be a theory about the real world. We have retreated to phenomenalism.  ``The formality of language and thought,'' one might say, `` is only to be thought of as providing a rough characterization of the phenomena we observe, and not as an emprical claim about a realworld system''.   This would be defensible, of course. Indeed, it would be unassailable.  But it would have very limited interest as a basis for theories about real language and speech or about real human cognitive processes.

So if phonetics fails to exhibit the appropriate properties (or if syntax or phonology or lexicon could be shown to fail this way), then the entire system is thereby shown not to be formal.  Phonetics only happens to be the most useful domain for an investigation of this  issue since it is both symbolic (according to the traditional theory) and intrinsically public and thus more readily treatable with objective measurements. 

There is also a second problem with taking the Some-Other-Level escape - for those who suggest that we just need to look further for specification of the correct phonetic level. One who is unsympathetic to the symbolic framework would be entitled to ask ``Well, just which other level close to phonetics do you think is going to exhibit the kind of discreteness and ease of identification demanded by the theory?'' How could the level that  in fact exhibits such clear symbolic structure have remained hidden and inaccessible to our experimental research techniques despite a half-century of technically sophisticated research on speech synthesis, speech recognition and human speech perception?  For the Some-Other-Level claim to be true, there would have to be some discrete code serving as the foundation of every human language none of whose basic universal  vocabulary has yet been discovered!  This strains credulity.  To insist that there exists such a universal, discrete and discrete-time level of analysis, but that we have no idea what its specific units actually are is to grasp at straws.  Such an attitude would amount to insisting that one's theory  is unfalsifiable. ``I just know my assumptions are correct so the evidence does not matter.''

Final Comment.  Clearly, a great deal is at stake in the empirical evidence about phonetics. Very simply, the formal theory of cognition and of language depends on there being a digital, discrete-time phonetics. If phonetics does not provide a discrete and universal alphabet, then the assumption that language is a formal system from top to bottom would have to be rejected as false.  Instead, we would have to look for a way to account for the apparent discreteness and digitality of specific phonologies without depending on a universal digital phonetics to provide the discreteness. The theory of phonology will have to account for phonological discreteness in other ways. (Some specific ideas about how to do this will be provided later in this paper.)  And the theory of language and cognition in general would have to deal the all the messiness and indeterminacy that we see in human speech. We could no longer be comforted by the easy bromide of so many linguists and cognitive scientists: ``Well, we don't know how it works, but at least we know that it does have a formal description.''

In the next section, the relevant empirical properties of the phonetic space will be surveyed and evaluated for the  compatibility with these empirical requirements.  We will find little evidence to support these claims.