ToucanTTS - A Toolkit for State-of-the-Art Speech Synthesis
a massively multilingual model covering over 7,000 languages
Free Online ToucanTTS
Try out ToucanTTS with the following examples.
What is ToucanTTS?
ToucanTTS is a toolkit developed by the Institute for Natural Language Processing (IMS) at the University of Stuttgart, Germany, for teaching, training, and using state-of-the-art speech synthesis models. It is built entirely in Python and PyTorch, aiming to be simple, beginner-friendly, yet powerful.
ToucanTTS Features
Multilingual and Multi-Speaker Support
Supports multilingual speech synthesis covering over 7,000 languages through a massively multilingual pretrained model. Enables multi-speaker speech synthesis and cloning of prosody (rhythm, stress, intonation) across speakers
Human-in-the-Loop Editing
Allows human-in-the-loop editing of synthesized speech, e.g., for poetry reading and literary studies
Interactive Demos
Provides interactive demos for massively multilingual speech synthesis, style cloning across speakers, voice design, and human-edited poetry reading
Architecture and Components
Primarily based on the FastSpeech 2 architecture with modifications like a normalizing flow-based PostNet inspired by PortaSpeech. Includes a self-contained aligner trained with Connectionist Temporal Classification (CTC) and spectrogram reconstruction for various applications.Offers pretrained models for the multilingual model, aligner, embedding function, vocoder, and embedding GAN
Ease of Use
Built entirely in Python and PyTorch, aiming to be simple and beginner-friendly while still powerful
Articulatory Representations
The IMS Toucan system incorporates articulatory representations of phonemes as input, allowing multilingual data to benefit low-resource languages
How to use ToucanTTS?
Let's get started with ToucanTTS in just a few simple steps.
Installation
Clone the IMS-Toucan repository from GitHub:
git clone https://github.com/DigitalPhonetics/IMS-Toucan
Preparing Data
Write a function mapping audio paths to transcripts, create a custom training pipeline script, and ensure text frontend language support.
Download Pretrained Models
Download pretrained models like the multilingual model, aligner, and vocoder using the provided script.
Apply Patches/Fixes
Apply any necessary patches or fixes to the codebase.
Model Training
Run your custom training pipeline script to fine-tune the models on your datas
Inference
Use provided interactive demos or scripts to generate speech from text using the trained models.
Frequently Asked Questions
Have a question? Check out some of the common queries below.
What is the primary architecture used in ToucanTTS?
ToucanTTS is primarily based on the FastSpeech 2 architecture with modifications like a normalizing flow-based PostNet inspired by PortaSpeech.
How does ToucanTTS support low-resource languages?
ToucanTTS incorporates articulatory representations of phonemes as input, allowing multilingual data to benefit low-resource languages.
Can ToucanTTS be used for multi-speaker speech synthesis?
Yes, ToucanTTS enables multi-speaker speech synthesis and cloning of prosody (rhythm, stress, intonation) across speakers.
What kind of demos are available in ToucanTTS?
ToucanTTS provides interactive demos for massively multilingual speech synthesis, style cloning across speakers, voice design, and human-edited poetry reading.
How many languages are covered by the massively multilingual pretrained model in ToucanTTS?
The massively multilingual pretrained model in ToucanTTS covers over 7,000 languages.
Is ToucanTTS easy to use?
Yes, ToucanTTS is built entirely in Python and PyTorch, aiming to be simple and beginner-friendly while still powerful.