The NLP module: some challenges

Participants in New York and Shanghai say the process works like this: People wait in line at an Apple store to buy the newest iPhone for \$600, paying a premium to skip the AT&T contract.
Sugimura reported that up to 3/4 of the participants aren't attending the workshops at the Neuchâtel conference.
The invention of techniques to separate lead from other components could lead to the development of a separate industry.
When she's not out fishing for bass, she plays the bass in a rock band.
It's not a rock band; it's a blues band.
Over and above the cost there are also issues of convenience.

Text normalization

Sentence tokenization
Handling non-standard words
- Tokenization
- Classification
- Expansion
Homograph disambiguation
- Using part-of-speech: separate, lives
- Other cases: bass, read

Dictionary lookup
Names
Grapheme-to-phoneme conversion
- Letter-to-phone alignment
- Choosing the best phone string

Diphones: from middle of one segment to middle of next
Deal with coarticulation problem: middles of segments tend to vary the least
Preparing the database
- Recording and segmentation
- Parametrization of diphones (based on speech analysis)
Selection of diphones
Concatenation, smoothing discontinuities: adjusting the F0 and duration of segments

Rather than concatenate recorded units, synthesize the output signal directly, based on average of some set of similarly sounding speech segments
HMM-based speech synthesis
- Training
  - Much as in HMM-based speech recognition systems: train context-dependent HMMs
  - Include F0 parameters as well as spectral parameters
- Synthesis
  - Inverse of speech recognition
  - Given a label sequence as the output of the NLP module, select and concatenate HMMs
  - Generate spectral and F0 values from the HMMs
  - Synthesize the waveform directly from these values