Meaning and form

Language is about the association between two very different domains, that of meaning (including everything anybody might ever want to talk about) and that of linguistic form (phonetic/orthographic units, words, sentences, discourses). Going from one to the other is what language users must somehow accomplish. Because the mapping is very complex, this is hard.

To the extent that we need to be concerned about meaning, it is inherently hard because it seems to require (1) a theory of things and events in the world (including hypothetical ones) and/or (2) a theory of how we perceive the things and events in the world.

Ambiguity: by aspect of language

Ambiguity: by what's needed in order to disambiguate

Structure

Units (other than the smallest units) consist of constituents. Constituency matters because it is meaningful; meanings are apparently created and extracted on the basis of something like compositionality: the meaning of a whole is a function of the meanings of its parts and the structural relationships among the parts.

Categories and invariance

Linguistic units belong to categories. Some of these, such as words, are directly involved in language analysis and generation. Others, such as syllables, are in the service of analysis or generation. Learning and recognizing categories means solving the invariance problem, discovering what matters and what does not for each category.

Redundancy and grammaticality

Languages are also often redundant in very specific and constrained ways. This may make language analysis easier, but language generators must adhere to these constraints in order to produce grammatical utterances.

Productivity and systematicity

Language generators can generate and language analyzers can analyze sentences and discourses that they've never heard before by recombining units in novel ways, using familiar patterns of combination. That is, language is productive. Whatever form it takes, knowledge of language has to permit generalization. Whether this involves only interpolation between known examples or extrapolation beyond them (more challenging) is not so clear.

Learnability

Obviously languages are learned, though how much is learned is a subject of lots of disagreement. The problem is that the input to the learner seems to underspecify what needs to get learned; that is, the range of possible "hypotheses" that are compatible with the input is too large. If the input to machines learning natural language is "natural", they will face similar problems. Apparently some sort of constraints are needed on what can be learned. But what sort?

Multiple sources of knowledge

Understanding and producing language require the simultaneous use of multiple levels of linguistic and non-linguistic knowledge and reasoning, including reasoning about the beliefs of the speaker/hearer (theory of mind).