Meaning and form
Language is about the association between two very different domains, that of
meaning (including everything anybody might ever want to talk about) and
that of linguistic form (phonetic/orthographic units, words, sentences, discourses).
Going from one to the other is what language users must somehow accomplish.
Because the mapping is very complex, this is hard.
To the extent that we need to be concerned about meaning, it is inherently hard because
it seems to require (1) a theory of things and events in the world (including hypothetical ones)
and/or (2) a theory of how
we perceive the things and events in the world.
- Units of language are often ambiguous;
they have multiple interpretations.
- Words and structures often fail to correspond across languages, and one way to think about the differences
is in terms of differences in meaning.
- In general, language is underspecified
in its output. Analyzers have to work to fill in the gaps.
Ambiguity: by aspect of language
- Word sense
She left her money on the bank.
- Syntactic category
The old man the boats.
- Structure
The boy saw the girl on the hill with the telescope.
- Reference
The waiter finally brought the customer his eggs, but he wasn't hungry anymore.
- Translation
alilala → he slept? he lay down?
Ambiguity: by what's needed in order to disambiguate
- Global ambiguity: whole sentence is ambiguous
- Well, did you do it?
- Why are you wearing THAT?
- Mary knocked her flat.
- Alice left her husband for the garbageman.
- It's hard to eat pizza with chopsticks.
- Swahili relative clauses
a-
| -na-
| -ye-
| -m- |
-penda
|
he/she
| PRES
| REL(3pers.sing.)
| him/her
| like
|
'who likes him/her'
'whom he/she likes'
|
- it is broken →
kimevunjika, kinavunjwa
- Local ambiguity: portions of a sentence are ambiguous
- The soup pot covers are missing.
- The inexperienced band together.
- Have the students who missed the exam take it today.
- The astronomer married the star.
- Hakuchoka kwa sababu alilala kwa muda mrefu.
→
He isn't tired because he slept a long time.
Structure
Units (other than the smallest units) consist of constituents.
Constituency matters because it is meaningful;
meanings are apparently created and extracted on the basis of something like compositionality:
the meaning of a whole is a function of the meanings of its parts and the
structural relationships among the parts.
-
The boundaries between constituents are usually not obvious.
Language analyzers need to segment the input.
- Language generation works by converting concepts into linguistic
units.
This involves a sort of semantic "segmentation" and a mapping of
these segments onto linguistic units.
- Identifying the structure of a chunk of language involves both segmentation,
finding boundaries between constituents, and aggregation,
combining elements into larger units.
The clues to how this is done may not be obvious.
-
Generating language presumably involves aggregation of smaller semantic units
into larger ones.
-
Recognizing and generating discourse involves analogical mappings between structures.
Categories and invariance
Linguistic units belong to categories.
Some of these, such as words, are directly involved in language analysis and generation.
Others, such as syllables, are in the service of analysis or generation.
Learning and recognizing categories means solving the invariance problem,
discovering what matters and what does not for each category.
-
Phonological categories, especially phonemes, are notoriously variable, depending
on the phonetic context, the speaker's age and gender, and global properties
of the utterance.
The invariance problem is the problem of establishing what it is that makes, say,
a /p/ a /p/ and what it is that's irrelevant and must be factored out in
identifying consonants.
-
The form of a morpheme may vary considerably, depending on the morphemes and phonemes around it.
Redundancy and grammaticality
Languages are also often redundant in very specific and constrained ways.
This may make language analysis easier, but
language generators must adhere to these constraints in order to produce
grammatical utterances.
Productivity and systematicity
Language generators can generate and language analyzers can analyze
sentences and discourses that they've never heard before by recombining units in
novel ways, using familiar patterns of combination.
That is, language is productive.
Whatever form it takes, knowledge of language has to permit generalization.
Whether this involves only interpolation between known examples or extrapolation
beyond them (more challenging) is not so clear.
Learnability
Obviously languages are learned, though how much is learned is a subject of
lots of disagreement.
The problem is that the input to the learner seems to underspecify what
needs to get learned; that is, the range of possible "hypotheses" that are
compatible with the input is too large.
If the input to machines learning natural language is "natural", they will face similar problems.
Apparently some sort of constraints are needed on what can be learned.
But what sort?
-
Children often learn the meanings of words on the basis of very few presentations.
Without constraints on what is a possible meaning, this seems impossible.
-
Multiple grammars seem to be compatible with the input children receive.
Without constraints on what is a possible grammar, it seems impossible to learn
grammar.
Multiple sources of knowledge
Understanding and producing language require the simultaneous use of multiple
levels of linguistic and non-linguistic knowledge and reasoning, including
reasoning about the beliefs of the speaker/hearer (theory of mind).