Footnotes

To whom correspondence should be addressed

We will use uppercase for concepts, italics for linguistic forms, and double quotes for utterances.

...representations.

We have no reason to believe, however, that the conclusions we reach will not generalize to other representational schemes. An alternative, for example, is a variant of localized encoding in which units on either side of the most highly activated unit are also activated, in inverse proportion to their distance from the activated unit. A version of the present network using such a scheme trained on the data generated for Experiment 3 below exhibited the same advantage for compact over elongated categories as was found with thermometer encoding.

Increasing the number of units in the hidden layer of the network both speeds up performance and leads to improvement in the asymptotic level of performance.

As we will see in subsequent experiments, the noun advantage in the network does not depend on there being only two terms for each adjective dimension.

For statistical tests here and in Experiments 2--5, we treated each run of the network as a separate subject.

An initial difference in learning but ultimately equal and near perfect learning of both nouns and adjectives is achieved with larger hidden layers.

All of the mean activations are negative because for this experiment, the network learns to strongly inhibit all but the right response for each training instance, and for the test patterns, there is no ``right'' response from among the trained categories.

For the analysis of variance, there were two factors, input linguistic context (noun or adjective) and average activation over output units by meta-category (noun or adjective). There was only one ``subject'' (network run) in this experiment, but there were 18 instances of each of the four combinations of the factors.

Michael Gasser
Fri Dec 6 13:15:34 EST 1996