Concatenative Synthesis


Graph
  • Record hours of human speech
  • Cut into small units
  • Stitch together to form new sentences

Types

  • Diphone concatenation
    • record transitions between phonemes
  • Unit Selection
    • record full sentences
    • segment sentences
    • index phonemes, diphones, syllables, words

Strengths

  • Natural Sounding
  • Dominated commercial tts for years

Weaknesses

  • Many recordings
  • glitchy transitions
  • limited expressivity
  • can’t modify emotion/style
  • coverage gaps

Methods

Example

  • Early Siri