The NLP module: some challenges
- Participants in New York and Shanghai say the process works like this:
People wait in line at an Apple store to buy the newest iPhone for \$600, paying a premium to skip the
AT&T contract.
- Sugimura reported that up to 3/4 of the participants aren't attending the workshops at the
Neuchâtel conference.
- The invention of techniques to separate lead from other
components could lead to the development of a separate industry.
- When she's not out fishing for bass, she plays the bass in a
rock band.
- It's not a rock band; it's a blues band.
- Over and above the cost there are also issues of convenience.
Text normalization
- Sentence tokenization
- Handling non-standard words
- Tokenization
- Classification
- Expansion
- Homograph disambiguation
- Using part-of-speech: separate, lives
- Other cases: bass, read
Phonetic analysis
- Dictionary lookup
- Names
- Grapheme-to-phoneme conversion
- Letter-to-phone alignment
- Choosing the best phone string
Prosodic analysis
- Prosodic structure
- Prosodic prominence
- Tune
- Computing duration
- Computing F0
Waveform synthesis: concatenative methods: diphones
- Diphones: from middle of one segment to middle of next
- Deal with coarticulation problem: middles of segments
tend to vary the least
- Preparing the database
- Recording and segmentation
- Parametrization of diphones (based on speech analysis)
- Selection of diphones
- Concatenation, smoothing discontinuities: adjusting the F0 and duration of segments
Waveform synthesis: concatenative methods: unit selection
- The database: many recordings of units of different sizes
- Unit selection
- Find the sequence of units that minimizes target cost and join cost
`hat U = text{argmin}_U sum_(t=1)^T T(s_t,u_t) + sum_(t=1)^(T-1) J(u_t,u_(t+1))`
- Target cost
`T(s_t,u_j) = sum_(p=1)^P w_p T_p(s_t[p],u_j[p])`
- Join cost
`J(u_t,u_(t+1)) = sum_(p=1)^P w_p J_p(u_t[p],u_(t+1)[p])`
- Sequence of units like the hidden states in a hidden Markov model: use the
standard algorithm (Viterbi) to find the best sequence
- Setting or learning the cost weights
Statistical parametric speech synthesis
- Rather than concatenate recorded units,
synthesize the output signal directly, based on
average of some set of similarly sounding speech segments
- HMM-based speech synthesis
- Training
- Much as in HMM-based speech recognition systems:
train context-dependent HMMs
- Include F0 parameters as well as spectral parameters
- Synthesis
- Inverse of speech recognition
- Given a label sequence as the output of the NLP module,
select and concatenate HMMs
- Generate spectral and F0 values from the HMMs
- Synthesize the waveform directly from these values