Hume AI is the empathic voice research lab turned product company. Founded 2021 by former Google AI emotion researcher Alan Cowen, the company ships three closed-source products plus one open-source TTS model:
- EVI (Empathic Voice Interface) is the speech-to-speech system. Handles interruptions, back-channeling, external LLM routing, and expressive instruction-following. Current production lineup: EVI 3 and EVI 4 mini.
- Octave is the text-to-speech model with voice design, modulation, cloning, and conversion. Emotional nuance is the differentiator, not raw quality.
- Expression Measurement API previously scored emotion in audio, video, text, and images, but Hume’s docs now put it in sunset: deprecated May 14, 2026 and no longer accessible June 14, 2026.
- TADA that streams text and audio together to reduce hallucinations and latency.
Supporting services: Human Feedback API (survey templates + participant pools), Data Library (speech datasets covering 50+ languages and 48 emotions), Study Runner (programmatic human evaluations).
System Verdict
Pick Hume AI when emotion is the differentiator in the voice interaction. Voice agents for therapy, wellness, coaching, and customer support benefit from EVI’s prosody-aware responses and interruption handling in ways that generic TTS does not deliver. Starter at $3/mo is one of the cheapest paid voice-AI entry points in the category, and Creator at $14/mo (with a permanent 50% first-month discount to $7) remains competitive for indie developers.
Skip it if raw TTS quality, ultra-low latency, or new emotion-analytics API work is the goal. ElevenLabs leads on voice quality ceiling; Cartesia wins on sub-40ms latency. Hume’s Octave is good but not best-in-class for pure narration. Expression Measurement should be treated as a migration item, not a fresh-build reason, unless Hume confirms a replacement. Also skip if you need self-hosted weights for on-prem deployment; EVI and Octave are both cloud-only.
Who pays which tier: Free for evaluation (5 min EVI, 10K TTS chars). Starter $3/mo for hobbyist voice-agent builders. Creator $14/mo (first month $7) for indie developers (140K chars + 200 min EVI + 1,000 projects). Pro $70/mo when production usage crosses 1M chars/mo. Scale $200/mo for teams needing 3 seats. Business $500/mo for 5-seat orgs with higher concurrency. Enterprise custom for SOC 2 + GDPR + HIPAA, unlimited usage, and Slack support.
Key Facts
| Core products | EVI (speech-to-speech) · Octave (TTS) · TADA (open-source) · Expression Measurement sunset docs |
| EVI versions | EVI 3 (full) · EVI 4 mini (smaller, faster) |
| Languages (datasets / TTS) | 50+ in the Data Library · multilingual TTS in Octave |
| Emotions measured | 48 distinct emotions in Expression Measurement |
| Voice descriptors | 600+ in the Data Library |
| Subscription pricing | Free · Starter $3 · Creator $14 (first month $7) · Pro $70 · Scale $200 · Business $500 · Enterprise custom |
| Octave TTS rate | $0.05 to $0.15 per 1,000 chars (plan-dependent) |
| EVI speech-to-speech | $0.04 to $0.07 per minute overage |
| Expression Measurement | Deprecated May 14, 2026 · endpoints no longer accessible June 14, 2026 |
| Concurrent connections | 1 Free · 5 Starter/Creator · 10 Pro · 20 Scale · 30 Business · unlimited Enterprise |
| Team seats | Solo through Pro · 3 Scale · 5 Business · custom Enterprise |
| Compliance | SOC 2 · GDPR · HIPAA (Enterprise) |
| Voice cloning | Included on all tiers (create + use) |
| Self-hosted | None on EVI / Octave · TADA is open-source |
Every data point above verified against Hume’s published sources on 2026-06-12.
What it actually is
A voice-AI platform with emotion science at the core. The company’s research heritage (Cowen’s earlier work at Google on facial and vocal emotion) shows up in Expression Measurement’s taxonomy of 48 distinct emotions and 600+ voice descriptors, trained on curated datasets covering 50+ languages across multiple domains.
EVI is the flagship. It’s a speech-to-speech system rather than a TTS pipeline: input audio in, response audio out, with the model handling prosody, interruptions, and back-channeling natively. Developers can route the LLM-level reasoning through external models (Claude, GPT, Gemini, open-source) while EVI owns the audio layer.
Octave is where voice quality sits. Less polished than ElevenLabs on pure narration but with more emotional range per prompt. Voice design lets developers spec voices by description (“warm, gentle, 40s female”) rather than cloning; voice cloning is included on all tiers.
Expression Measurement is the academic-heritage product now in sunset. Existing users should treat it as a migration project; new buyers should not evaluate Hume primarily for emotion-analytics API access unless their Hume contact confirms a replacement path.
TADA architecture where text and audio stream together to reduce hallucination. Useful for teams evaluating Hume’s approach before committing to EVI.
When to pick Hume AI
- Voice agents where emotion matters. Therapy, coaching, wellness, customer support, companion apps. EVI’s interruption handling and back-channeling feel conversationally different from ChatGPT Voice or ElevenLabs Conversational.
- Emotional nuance in narration. Octave with emotion tags produces delivery variations that straight TTS misses. Useful for character voices, audiobook dramatization, and expressive brand voices.
- Migration planning for emotion analytics. Existing Expression Measurement users need a plan before the June 14, 2026 cutoff. Net-new analytics projects should ask Hume for the replacement path before implementation.
- Budget-friendly voice-AI entry. Starter at $3/mo stays among the cheapest paid voice-AI entry points. Creator moved to $14/mo as of May 2026 with a permanent 50% first-month discount ($7), still competitive for indie developers wanting 140K TTS chars + 200 EVI minutes. Free tier includes 5 min EVI + 10K TTS chars for evaluation.
- External LLM flexibility. EVI’s architecture lets developers bring their own LLM for reasoning while Hume owns the voice layer. Useful for teams already committed to a specific model.
- Research-adjacent workflows. Data Library, Human Feedback API, and Study Runner serve academic and commercial research teams that other voice-AI vendors do not target.
When to pick something else
- Peak voice quality or multilingual breadth: ElevenLabs. Eleven v3 leads on narration quality and language coverage (70+ languages on v3); Octave’s positioning is emotion, not peak fidelity.
- Ultra-low latency for real-time agents: Cartesia delivers sub-40ms vs EVI’s higher round-trip latency.
- Open-source self-hosting of the full stack: Fish Audio. TADA is open-source but it’s the text-audio streaming architecture, not a drop-in EVI replacement.
- Single TTS replacement for an ElevenLabs subscription: Octave works but does not match ElevenLabs on quality ceiling. Consider whether the emotional-range differentiation justifies the switch.
Pricing
Subscription pricing via hume.ai/pricing:
| Plan | Monthly | TTS Characters | EVI Minutes | Concurrent | Projects | Seats |
|---|---|---|---|---|---|---|
| Free | $0 | 10K | 5 | 1 | 20 | 1 |
| Starter | $3 | 30K | 40 | 5 | 20 | 1 |
| Creator | $14 (first month $7, 50% off) | 140K | 200 | 5 | 1,000 | 1 |
| Pro | $70 | 1M | 1,200 | 10 | 3,000 | 1 |
| Scale | $200 | 3.3M | 5,000 | 20 | 10,000 | 3 |
| Business | $500 | 10M | 12,500 | 30 | 20,000 | 5 |
| Enterprise | Custom | Unlimited | Unlimited | Unlimited | Unlimited | Custom |
Usage-based rates (overages or custom workflows):
| Service | Rate |
|---|---|
| Octave TTS | $0.05 to $0.15 per 1,000 characters (plan-dependent) |
| EVI speech-to-speech overage | $0.04 to $0.07 per minute |
| Expression Measurement | Deprecated May 14, 2026; endpoints no longer accessible June 14, 2026 |
Prices verified 2026-06-12 via Hume pricing, Hume products, and Hume Expression Measurement docs. Voice cloning (create and use) is included on all tiers. Enterprise adds API voice access, SOC 2 / GDPR / HIPAA compliance, Slack support, and custom rate limits.
Against the alternatives
| Hume Octave | ElevenLabs v3 | Cartesia | |
|---|---|---|---|
| Voice quality ceiling | Strong, emotion-focused | Highest on v3 | Strong, speed-optimized |
| Emotional nuance | Strongest (48 emotions, voice descriptors) | Audio tags on v3 | Limited prosody control |
| Real-time latency | Higher (EVI roundtrip) | ~75ms on Flash v2.5 | Sub-40ms (category leader) |
| Voice cloning | Included all tiers | IVC + PVC on Creator+ | Available |
| Speech-to-speech | EVI (native) | Requires Conversational AI setup | Available |
| Emotion analytics API | Sunset: deprecated May 14 / inaccessible June 14, 2026 | None | None |
| Open-source option | TADA (partial) | None | None |
| Entry price | $3/mo Starter ($14 Creator) | $6/mo Starter | Paid tier only |
| Best viewed as | Emotion-AI specialist | Quality + coverage leader | Latency specialist |
Failure modes
- Raw TTS quality is not the lead. Octave is good but not category-leading. Teams that prioritize peak narration fidelity over emotional range should pair Hume with or switch to ElevenLabs for narration work.
- Latency on EVI is higher than Cartesia or Flash v2.5. Speech-to-speech roundtrip is the tradeoff for prosody-aware processing. Real-time agents needing sub-100ms feel should benchmark before committing.
- Free tier is tight. 5 minutes of EVI and 10K TTS chars per month is evaluation-only. Serious usage starts at Starter $3/mo.
- Multiple quota types can surprise. TTS characters, EVI minutes, concurrent connections, and projects all scale independently per tier. Heavy usage on one dimension can force a tier upgrade even if others have headroom.
- Expression Measurement is sunsetting. Hume docs say the endpoints were deprecated May 14, 2026 and become inaccessible June 14, 2026. Do not treat older modality pricing as a durable buying reason.
- Self-hosting is limited. EVI and Octave are cloud-only. TADA is open-source but it is an LLM-TTS architecture, not a drop-in replacement for EVI or Octave.
- Research-voice positioning cuts both ways. The academic heritage gives Hume credibility on emotion but slows mainstream adoption versus flashier category leaders.
Methodology
This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies pricing and model details against primary sources, and generates the editorial analysis you are reading. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility x Value x Moat x Longevity, unweighted average). Last verified 2026-06-12 against Hume pricing, the Hume homepage, Hume products, and the Expression Measurement docs.
FAQ
What is Hume AI known for? Empathic voice AI. EVI handles prosody-aware speech-to-speech and Octave delivers emotional nuance in TTS. Historically Hume was also known for Expression Measurement, but that API is now sunsetting.
What is EVI and how does it differ from ChatGPT Voice or ElevenLabs Conversational? EVI (Empathic Voice Interface) is a speech-to-speech system with native support for interruptions, back-channeling, and prosody-aware responses. ChatGPT Voice and ElevenLabs Conversational focus on voice quality; EVI focuses on conversational feel. Developers route LLM-level reasoning through external models (Claude, GPT, Gemini) while Hume owns the audio layer.
Is Hume AI free? Yes, the Free tier includes 5 minutes of EVI and 10,000 TTS characters per month. Sufficient for evaluation. Starter at $3/mo is the lowest paid tier for actual deployments.
What is Octave? Octave is Hume’s text-to-speech model with voice design, modulation, cloning, and conversion. Emotional range is the differentiator; it does not compete with ElevenLabs v3 on peak narration quality.
What is Expression Measurement? An API for scoring emotion in audio, video, text, or images. It is now a migration risk: Hume docs say Expression Measurement endpoints were deprecated May 14, 2026 and become inaccessible June 14, 2026.
What is TADA? Hume’s open-source LLM-TTS system that streams text and audio together, reducing hallucinations and latency. Useful as a research-friendly alternative to proprietary TTS architectures but not a drop-in EVI or Octave replacement.
Can I clone my voice on Hume? Yes, voice cloning (create and use) is included on all tiers, including Free. This is unusual; most competitors gate cloning behind paid plans.
Sources
- Hume AI pricing: current plan prices, quotas, seats, usage rates
- Hume AI homepage: product descriptions for EVI, Octave, TADA, Expression Measurement
- Hume developer platform: API reference, SDKs, model versioning
- Hume Expression Measurement docs: deprecation and June 14, 2026 accessibility cutoff
Related
- Category: AI Voice
- Alternatives: ElevenLabs · Cartesia · Fish Audio · Lovo · Murf
- Use cases: Best AI for Voice Agents