Meaning and form

Language is about the association between two very different domains, that of meaning (including everything anybody might ever want to talk about) and that of linguistic form (phonetic/orthographic units, words, sentences, discourses). Going from one to the other is what language users must somehow accomplish. Because the mapping is very complex, this is hard.

To the extent that we need to be concerned about meaning, it is inherently hard because it seems to require (1) a theory of things and events in the world (including hypothetical ones) and/or (2) a theory of how we perceive the things and events in the world.

Units of language are often ambiguous; they have multiple interpretations.
Words and structures often fail to correspond across languages, and one way to think about the differences is in terms of differences in meaning.
In general, language is underspecified in its output. Analyzers have to work to fill in the gaps.

Ambiguity: by aspect of language

Ambiguity: by what's needed in order to disambiguate

Global ambiguity: whole sentence is ambiguous
- Well, did you do it?
- Why are you wearing THAT?
- Mary knocked her flat.
- Alice left her husband for the garbageman.
- It's hard to eat pizza with chopsticks.
- Swahili relative clauses
  
  a- -na- -ye- -m- -penda
  
  he/she PRES REL(3pers.sing.) him/her like
  
  'who likes him/her'
  'whom he/she likes'
- it is broken → kimevunjika, kinavunjwa
Local ambiguity: portions of a sentence are ambiguous
- The soup pot covers are missing.
- The inexperienced band together.
- Have the students who missed the exam take it today.
- The astronomer married the star.
- Hakuchoka kwa sababu alilala kwa muda mrefu. → He isn't tired because he slept a long time.

Structure

Units (other than the smallest units) consist of constituents. Constituency matters because it is meaningful; meanings are apparently created and extracted on the basis of something like compositionality: the meaning of a whole is a function of the meanings of its parts and the structural relationships among the parts.

The boundaries between constituents are usually not obvious. Language analyzers need to segment the input.
Language generation works by converting concepts into linguistic units. This involves a sort of semantic "segmentation" and a mapping of these segments onto linguistic units.
Identifying the structure of a chunk of language involves both segmentation, finding boundaries between constituents, and aggregation, combining elements into larger units. The clues to how this is done may not be obvious.
Generating language presumably involves aggregation of smaller semantic units into larger ones.
Recognizing and generating discourse involves analogical mappings between structures.

Categories and invariance

Linguistic units belong to categories. Some of these, such as words, are directly involved in language analysis and generation. Others, such as syllables, are in the service of analysis or generation. Learning and recognizing categories means solving the invariance problem, discovering what matters and what does not for each category.

Phonological categories, especially phonemes, are notoriously variable, depending on the phonetic context, the speaker's age and gender, and global properties of the utterance. The invariance problem is the problem of establishing what it is that makes, say, a /p/ a /p/ and what it is that's irrelevant and must be factored out in identifying consonants.
The form of a morpheme may vary considerably, depending on the morphemes and phonemes around it.

Redundancy and grammaticality

Languages are also often redundant in very specific and constrained ways. This may make language analysis easier, but language generators must adhere to these constraints in order to produce grammatical utterances.

Productivity and systematicity

Language generators can generate and language analyzers can analyze sentences and discourses that they've never heard before by recombining units in novel ways, using familiar patterns of combination. That is, language is productive. Whatever form it takes, knowledge of language has to permit generalization. Whether this involves only interpolation between known examples or extrapolation beyond them (more challenging) is not so clear.

Learnability

Obviously languages are learned, though how much is learned is a subject of lots of disagreement. The problem is that the input to the learner seems to underspecify what needs to get learned; that is, the range of possible "hypotheses" that are compatible with the input is too large. If the input to machines learning natural language is "natural", they will face similar problems. Apparently some sort of constraints are needed on what can be learned. But what sort?

Multiple sources of knowledge

Understanding and producing language require the simultaneous use of multiple levels of linguistic and non-linguistic knowledge and reasoning, including reasoning about the beliefs of the speaker/hearer (theory of mind).

a-	-na-	-ye-	-m-	-penda
he/she	PRES	REL(3pers.sing.)	him/her	like
'who likes him/her' 'whom he/she likes'