Hume AI: Features, Pricing & Review (May 2026)

Hume AI is the empathic voice research lab turned product company. Founded 2021 by former Google AI emotion researcher Alan Cowen, the company ships three closed-source products plus one open-source TTS model:

EVI (Empathic Voice Interface) is the speech-to-speech system. Handles interruptions, back-channeling, external LLM routing, and expressive instruction-following. Current production lineup: EVI 3 and EVI 4 mini.
Octave is the text-to-speech model with voice design, modulation, cloning, and conversion. Emotional nuance is the differentiator, not raw quality.
Expression Measurement API scores emotion in audio, video, text, and images. The only production-grade emotion-measurement API with academic-research heritage.
TADA that streams text and audio together to reduce hallucinations and latency.

Supporting services: Human Feedback API (survey templates + participant pools), Data Library (speech datasets covering 50+ languages and 48 emotions), Study Runner (programmatic human evaluations).

System Verdict

Pick Hume AI when emotion is the differentiator. Voice agents for therapy, wellness, coaching, and customer support benefit from EVI’s prosody-aware responses and interruption handling in ways that generic TTS does not deliver. Expression Measurement is the only mature API for scoring emotion across audio, video, text, and images at production scale. Starter at $3/mo and Creator at $7/mo are among the cheapest paid voice-AI entry points in the category.

Skip it if raw TTS quality or ultra-low latency is the goal. ElevenLabs leads on voice quality ceiling; Cartesia wins on sub-40ms latency. Hume’s Octave is good but not best-in-class for pure narration. Also skip if you need self-hosted weights for on-prem deployment; EVI and Octave are both cloud-only.

Who pays which tier: Free for evaluation (5 min EVI, 10K TTS chars). Starter $3/mo for hobbyist voice-agent builders. Creator $7/mo for most indie developers (140K chars + 200 min EVI). Pro $70/mo when production usage crosses 1M chars/mo. Scale $200/mo for teams needing 3 seats. Business $500/mo for 5-seat orgs with higher concurrency. Enterprise custom for SOC 2 + GDPR + HIPAA, unlimited usage, and Slack support.

Key Facts


Core products	EVI (speech-to-speech) · Octave (TTS) · Expression Measurement · TADA (open-source)
EVI versions	EVI 3 (full) · EVI 4 mini (smaller, faster)
Languages (datasets / TTS)	50+ in the Data Library · multilingual TTS in Octave
Emotions measured	48 distinct emotions in Expression Measurement
Voice descriptors	600+ in the Data Library
Subscription pricing	Free · Starter $3 · Creator $7 · Pro $70 · Scale $200 · Business $500 · Enterprise custom
Octave TTS rate	$0.05 to $0.15 per 1,000 chars (plan-dependent)
EVI speech-to-speech	$0.04 to $0.07 per minute overage
Expression Measurement	Video+audio $0.0828/min · audio $0.0639/min · video $0.045/min · images $0.00204 each · text $0.00024/word
Concurrent connections	1 Free · 5 Starter/Creator · 10 Pro · 20 Scale · 30 Business · unlimited Enterprise
Team seats	Solo through Pro · 3 Scale · 5 Business · custom Enterprise
Compliance	SOC 2 · GDPR · HIPAA (Enterprise)
Voice cloning	Included on all tiers (create + use)
Self-hosted	None on EVI / Octave · TADA is open-source

Every data point above verified against Hume’s published sources on 2026-05-02.

What it actually is

A voice-AI platform with emotion science at the core. The company’s research heritage (Cowen’s earlier work at Google on facial and vocal emotion) shows up in Expression Measurement’s taxonomy of 48 distinct emotions and 600+ voice descriptors, trained on curated datasets covering 50+ languages across multiple domains.

EVI is the flagship. It’s a speech-to-speech system rather than a TTS pipeline: input audio in, response audio out, with the model handling prosody, interruptions, and back-channeling natively. Developers can route the LLM-level reasoning through external models (Claude, GPT, Gemini, open-source) while EVI owns the audio layer.

Octave is where voice quality sits. Less polished than ElevenLabs on pure narration but with more emotional range per prompt. Voice design lets developers spec voices by description (“warm, gentle, 40s female”) rather than cloning; voice cloning is included on all tiers.

Expression Measurement is the academic-heritage product. Score emotion from audio, video, text, or images. Used in UX research, wellness apps, market research, and accessibility products.

TADA architecture where text and audio stream together to reduce hallucination. Useful for teams evaluating Hume’s approach before committing to EVI.

When to pick Hume AI

Voice agents where emotion matters. Therapy, coaching, wellness, customer support, companion apps. EVI’s interruption handling and back-channeling feel conversationally different from ChatGPT Voice or ElevenLabs Conversational.
Emotional nuance in narration. Octave with emotion tags produces delivery variations that straight TTS misses. Useful for character voices, audiobook dramatization, and expressive brand voices.
Emotion analytics. Expression Measurement API is the only production-grade option for scoring emotion at scale across audio, video, text, and images. UX researchers, wellness apps, and accessibility tools land here.
Budget-friendly voice-AI entry. Starter at $3/mo and Creator at $7/mo are genuinely affordable for hobbyist and indie developers. Free tier includes 5 min EVI + 10K TTS chars for evaluation.
External LLM flexibility. EVI’s architecture lets developers bring their own LLM for reasoning while Hume owns the voice layer. Useful for teams already committed to a specific model.
Research-adjacent workflows. Data Library, Human Feedback API, and Study Runner serve academic and commercial research teams that other voice-AI vendors do not target.

When to pick something else

Peak voice quality or multilingual breadth: ElevenLabs. Eleven v3 leads on narration quality and language coverage (70+ languages on v3); Octave’s positioning is emotion, not peak fidelity.
Ultra-low latency for real-time agents: Cartesia delivers sub-40ms vs EVI’s higher round-trip latency.
Open-source self-hosting of the full stack: Fish Audio for open-weights TTS. TADA is open-source but it’s the text-audio streaming architecture, not a drop-in EVI replacement.
Single TTS replacement for an ElevenLabs subscription: Octave works but does not match ElevenLabs on quality ceiling. Consider whether the emotional-range differentiation justifies the switch.

Pricing

Subscription pricing via hume.ai/pricing:

Plan	Monthly	TTS Characters	EVI Minutes	Concurrent	Projects	Seats
Free	$0	10K	5	1	20	1
Starter	$3	30K	40	5	20	1
Creator	$7 (first month $3.50)	140K	200	5	1,000	1
Pro	$70	1M	1,200	10	3,000	1
Scale	$200	3.3M	5,000	20	10,000	3
Business	$500	10M	12,500	30	20,000	5
Enterprise	Custom	Unlimited	Unlimited	Unlimited	Unlimited	Custom

Usage-based rates (overages or custom workflows):

Service	Rate
Octave TTS	$0.05 to $0.15 per 1,000 characters (plan-dependent)
EVI speech-to-speech overage	$0.04 to $0.07 per minute
Expression Measurement: video+audio	$0.0828 / minute
Expression Measurement: audio only	$0.0639 / minute
Expression Measurement: video only	$0.045 / minute
Expression Measurement: images	$0.00204 each
Expression Measurement: text	$0.00024 / word

Prices verified 2026-05-02 via Hume pricing and the Hume homepage. Voice cloning (create and use) is included on all tiers. Enterprise adds API voice access, SOC 2 / GDPR / HIPAA compliance, Slack support, and custom rate limits.

Against the alternatives

	Hume Octave	ElevenLabs v3	Cartesia
Voice quality ceiling	Strong, emotion-focused	Highest on v3	Strong, speed-optimized
Emotional nuance	Strongest (48 emotions, voice descriptors)	Audio tags on v3	Limited prosody control
Real-time latency	Higher (EVI roundtrip)	~75ms on Flash v2.5	Sub-40ms (category leader)
Voice cloning	Included all tiers	IVC + PVC on Creator+	Available
Speech-to-speech	EVI (native)	Requires Conversational AI setup	Available
Emotion analytics API	Yes (Expression Measurement)	None	None
Open-source option	TADA (partial)	None	None
Entry price	$3/mo Starter	$6/mo Starter	Paid tier only
Best viewed as	Emotion-AI specialist	Quality + coverage leader	Latency specialist

Failure modes

Raw TTS quality is not the lead. Octave is good but not category-leading. Teams that prioritize peak narration fidelity over emotional range should pair Hume with or switch to ElevenLabs for narration work.
Latency on EVI is higher than Cartesia or Flash v2.5. Speech-to-speech roundtrip is the tradeoff for prosody-aware processing. Real-time agents needing sub-100ms feel should benchmark before committing.
Free tier is tight. 5 minutes of EVI and 10K TTS chars per month is evaluation-only. Serious usage starts at Starter $3/mo.
Multiple quota types can surprise. TTS characters, EVI minutes, concurrent connections, and projects all scale independently per tier. Heavy usage on one dimension can force a tier upgrade even if others have headroom.
Expression Measurement pricing varies by modality. Video+audio is ~3x audio-only cost; text is essentially free at $0.00024/word. Budget modelling requires thinking in terms of which modality drives volume.
Self-hosting is limited. EVI and Octave are cloud-only. TADA is open-source but it is an LLM-TTS architecture, not a drop-in replacement for EVI or Octave.
Research-voice positioning cuts both ways. The academic heritage gives Hume credibility on emotion but slows mainstream adoption versus flashier category leaders.

Methodology

This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies pricing and model details against primary sources, and generates the editorial analysis you are reading. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility × Value × Moat × Longevity, unweighted average). Last verified 2026-05-02 against Hume pricing, the Hume homepage, and the Hume platform docs.

FAQ

What is Hume AI known for? Empathic voice AI. The company is the only production-grade vendor focused on emotional intelligence in voice interactions. EVI handles prosody-aware speech-to-speech; Expression Measurement scores 48 distinct emotions across audio, video, text, and images; Octave delivers emotional nuance in TTS.

What is EVI and how does it differ from ChatGPT Voice or ElevenLabs Conversational? EVI (Empathic Voice Interface) is a speech-to-speech system with native support for interruptions, back-channeling, and prosody-aware responses. ChatGPT Voice and ElevenLabs Conversational focus on voice quality; EVI focuses on conversational feel. Developers route LLM-level reasoning through external models (Claude, GPT, Gemini) while Hume owns the audio layer.

Is Hume AI free? Yes, the Free tier includes 5 minutes of EVI and 10,000 TTS characters per month. Sufficient for evaluation. Starter at $3/mo is the lowest paid tier for actual deployments.

What is Octave? Octave is Hume’s text-to-speech model with voice design, modulation, cloning, and conversion. Emotional range is the differentiator; it does not compete with ElevenLabs v3 on peak narration quality.

What is Expression Measurement? An API for scoring emotion in audio, video, text, or images. Trained on curated datasets covering 48 distinct emotions. Used in UX research, wellness apps, accessibility tools, and market research. Pricing varies by modality: $0.0828/min for video+audio, down to $0.00024/word for text.

What is TADA? Hume’s open-source LLM-TTS system that streams text and audio together, reducing hallucinations and latency. Useful as a research-friendly alternative to proprietary TTS architectures but not a drop-in EVI or Octave replacement.

Can I clone my voice on Hume? Yes, voice cloning (create and use) is included on all tiers, including Free. This is unusual; most competitors gate cloning behind paid plans.

Sources

Hume AI pricing: current plan prices, quotas, seats, usage rates
Hume AI homepage: product descriptions for EVI, Octave, TADA, Expression Measurement
Hume developer platform: API reference, SDKs, model versioning

Category: AI Voice
Alternatives: ElevenLabs · Cartesia · Fish Audio · Lovo · Murf
Use cases: Best AI for Voice Agents

Share LinkedIn

Was this review helpful?

Embed this score on your site Free. Links back.

HTML

<a href="https://aipedia.wiki/tools/hume-ai/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/hume-ai.svg" alt="Hume AI on aipedia.wiki" width="260" height="72" /></a>

Markdown

[![Hume AI on aipedia.wiki](https://aipedia.wiki/badges/hume-ai.svg)](https://aipedia.wiki/tools/hume-ai/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers

News writers

According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/hume-ai/)

APA

aipedia.wiki Editorial. (2026). Hume AI — Editorial Review. aipedia.wiki. Retrieved May 8, 2026, from https://aipedia.wiki/tools/hume-ai/

MLA 9

aipedia.wiki Editorial. "Hume AI — Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/hume-ai/. Accessed May 8, 2026.

Chicago

aipedia.wiki Editorial. 2026. "Hume AI — Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/hume-ai/.

BibTeX

@misc{hume-ai-editorial-review-2026,
  author = {{aipedia.wiki Editorial}},
  title = {Hume AI — Editorial Review},
  year = {2026},
  publisher = {aipedia.wiki},
  url = {https://aipedia.wiki/tools/hume-ai/},
  note = {Accessed: 2026-05-08}
}

Spotted an error or want to share your experience with Hume AI?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Hume AI and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki

Report outdated info Help us keep this page accurate