#45 — Orpheus
March 19, 2025•2 min read

Why it matters: Canopy Labs has released Orpheus, a groundbreaking family of speech-LLMs that finally brings human-level speech generation to the open-source community, challenging the dominance of closed-source models.
The big picture: Until now, open-source TTS solutions have lagged behind proprietary offerings in quality and emotional intelligence. Orpheus changes that paradigm with state-of-the-art performance even in its smallest configurations.
What's new
Canopy Labs is releasing four model sizes based on the Llama architecture:
- Medium (3B parameters)
- Small (1B parameters)
- Tiny (400M parameters)
- Nano (150M parameters)
Between the lines: Even the smallest models deliver "extremely high quality, aesthetically pleasing speech generation," making this technology accessible across various computing environments.
Technical innovation
Orpheus leverages Llama-3b as its backbone, trained on:
- 100,000+ hours of English speech data
- Billions of text tokens
The edge: This dual training approach enhances TTS performance while maintaining sophisticated language understanding.
Standout capabilities
Zero-shot voice cloning: Without specific training for this task, Orpheus demonstrates emergent voice cloning abilities that match or exceed industry leaders like ElevenLabs and PlayHT.
Emotion control: The model can be taught specific emotional expressions with minimal fine-tuning examples, responding to tags like ,
, ``, and even handling disfluencies naturally.
Production-ready features
Real-time performance: Orpheus supports output streaming with approximately 200ms latency, which can be further reduced to 25-50ms using input streaming into the KV cache.
By the numbers: Streaming inference runs faster than real-time playback even on an A100 40GB GPU with the 3B parameter model.
Technical differentiators
Canopy Labs made two unconventional design choices:
- Using a flattened sequence decoding approach (7 tokens per frame)
- Implementing a non-streaming CNN-based tokenizer with a sliding window modification
The bottom line: These choices enable real-time generation without the "popping" artifacts common in other SNAC-based speech LLMs.
What's next
Canopy Labs hints at releasing an open-source end-to-end speech model "in the coming weeks," using the same architecture and training methodology.
How to try it: Demos and code are available on GitHub, Hugging Face, and through an interactive Google Colab notebook.
Keep reading

#46 — The elements of effective storytelling
Master the five key ingredients of business storytelling to make your pitch memorable.

#47 — Hunyuan-T1
Tencent unveils Hunyuan-T1, the first ultra-large Mamba-powered AI model that pioneers a new scaling paradigm using reinforcement learning.

#48 — Cold Take: Broad automation trumps R&D acceleration
While institutions focus on AI for research breakthroughs, economic data suggests the real value lies in widespread automation.