- Record hours of human speech
- Cut into small units
- Stitch together to form new sentences
Types
- Diphone concatenation
- record transitions between phonemes
- Unit Selection
- record full sentences
- segment sentences
- index phonemes, diphones, syllables, words
Strengths
- Natural Sounding
- Dominated commercial tts for years
Weaknesses
- Many recordings
- glitchy transitions
- limited expressivity
- can’t modify emotion/style
- coverage gaps
Methods
- Can use heuristic-based-search-algorithm
- target cost
- join cost
Example
- Early Siri