Concatenative Synthesis

Graph

Record hours of human speech
Cut into small units
Stitch together to form new sentences

Types

Diphone concatenation
- record transitions between phonemes
Unit Selection
- record full sentences
- segment sentences
- index phonemes, diphones, syllables, words

Strengths

Natural Sounding
Dominated commercial tts for years

Weaknesses

Many recordings
glitchy transitions
limited expressivity
can’t modify emotion/style
coverage gaps

Methods

Can use heuristic-based-search-algorithm
- target cost
- join cost

Example

Early Siri