MiniMax Speech

Multilingual TTS, long-form speech generation, and voice cloning API with Speech 2.8 HD/Turbo as the current model family and subscription or pay-as-you-go pricing.

6.8/10 Useful

Active

$5-$999/mo subscriptions / $60-$100 per 1M chars PAYG

Best plan

$5-$999/mo subscriptions / $60-$100 per 1M chars PAYG

Watch out: Model names and billing surfaces changed: Speech 2.8 is latest, Speech 2.6/Speech-02 remain supported, and subscription, token-plan, and pay-as-you-go routes can expose different limits

Try MiniMax Speech

Editorial · no paid placements

The call

MiniMax Speech is the budget ElevenLabs alternative for multilingual TTS and voice cloning. Pick Speech 2.8 Turbo or HD when hosted API economics, voice slots, RPM, and multilingual output matter, especially for apps, IVR, dubbing, agents, or bulk narration. Choose ElevenLabs when the highest creator polish, marketplace breadth, and integrations matter more.

Buy if Cost-sensitive production tts workloads
Pick $5-$999/mo subscriptions / $60-$100 per 1M chars PAYG
Skip if Users needing the highest quality ceiling for audiobooks or luxury production

Evidence rail

Why this recommendation is trusted

Evidence MiniMax T2A docs

Source: Registered source
Freshness: Current
Confidence: High confidence
Verified: Jun 8, 2026
Review: Jul 8, 2026
Volatility: Volatile

High-volatility evidence needs frequent review.

Build comparison

Watch out: Model names and billing surfaces changed: Speech 2.8 is latest, Speech 2.6/Speech-02 remain supported, and subscription, token-plan, and pay-as-you-go routes can expose different limits.

Editorial score

Unweighted average of 4 axes · confidence high

Utility 8/10

How much real work it can do for a competent operator, end to end.
Value 9/10

What you get for the dollar relative to the closest alternative.
Moat 4/10

How hard it would be for a competitor to replicate the underlying advantage.
Longevity 6/10

How likely the product is to still be best-in-class 24 months out.

Key facts

Best For Multilingual TTS, long-form speech generation, streaming, and voice cloning API with Speech 2.8 HD/Turbo as the current model family, 300+ system voices, custom cloned voices, and multiple pricing modes.
high Drifts 2026-06-12 MiniMax T2A docs
Pricing Anchor Audio Subscription starts at $5/month for 100,000 credits; pay-as-you-go lists T2A Turbo at $60/M characters and T2A HD at $100/M characters.
high Volatile 2026-06-12 MiniMax Speech pricing docs
Watch Out For Model names and billing surfaces changed: Speech 2.8 is latest, Speech 2.6/Speech-02 remain supported, and subscription, token-plan, and pay-as-you-go routes can expose different limits.
high Volatile 2026-06-12 MiniMax T2A docs

The text-to-speech and voice-cloning product line from MiniMax, the Shanghai AI lab. The current API docs now put speech-2.8-hd and speech-2.8-turbo at the front of the model list, while Speech 2.6 and Speech-02 remain supported legacy/current-compatibility routes.

The current docs list 300+ system voices plus custom cloned voices, streaming output, MP3/WAV/FLAC/PCM-style audio support across endpoints, synchronous requests up to 10,000 characters, and async long-form generation up to 1 million characters per task.

System Verdict

Pick MiniMax Speech if the brief is multilingual TTS at production volume where API economics drive the budget. As of June 12, 2026 the pay-as-you-go page lists T2A Turbo at $60 per million characters and T2A HD at $100 per million characters, while Audio Subscription plans start at $5/month for 100,000 credits and scale to $999/month for 20,000,000 credits.

Skip it for peak-quality audiobook and luxury production work. ElevenLabs still holds the quality ceiling, the larger curated voice marketplace, and the deeper third-party integration stack. Cartesia owns low-latency guarantees.

The naming drift matters. Current docs list Speech 2.8 as latest, while Speech 2.6 and Speech-02 remain visible in API references, pay-as-you-go pricing, token plans, and third-party mirrors. Integration requires checking the exact endpoint and plan, not just the model family name.

Key Facts


Vendor	MiniMax (Shanghai, HKEX-listed)
Current API models	speech-2.8-hd · speech-2.8-turbo
Supported older speech models	speech-2.6-hd · speech-2.6-turbo · speech-02-hd · speech-02-turbo · speech-01-hd · speech-01-turbo
Pay-as-you-go T2A price	Turbo $60/M characters · HD $100/M characters
Audio Subscription entry	Starter $5/mo · 100,000 credits/mo
System voices	300+ plus custom cloned voices
Voice cloning	Rapid cloning from uploaded mono/stereo reference audio; clone is temporary unless used in T2A within 168 hours
Long-form async	Up to 1 million characters per async task
Streaming	Supported through HTTP/WebSocket T2A endpoints
Output formats	MP3, WAV, FLAC, PCM depending on endpoint and streaming mode
Official MCP	Python and JavaScript MCP server implementations with voice cloning support

What it actually is

A hosted TTS API with synchronous T2A, WebSocket T2A, async long-form T2A, voice cloning, voice design, and voice management. Turbo is the cost/speed lane. HD is the fidelity lane. Speech 2.8 is the latest named model family in the current API docs.

Voice cloning now matters as a workflow and governance question, not just a feature bullet. The current API intro says rapid clones are temporary unless used in speech synthesis within 168 hours, and the fee is charged the first time the cloned voice is used in T2A synthesis.

Speed, pitch, volume, bitrate, sample rate, language boost, subtitle output, voice effects, and streaming settings are exposed through the API. Sync endpoints handle up to 10,000 characters per request; async long-form generation handles up to 1 million characters.

When to pick MiniMax Speech

Scaling multilingual IVR, chatbots, or conversational AI. Turbo at $60 per million characters supports high-volume voice agents economically when the team can integrate directly.
Multilingual content pipelines. One vendor for 40 languages avoids per-market vendor sprawl.
Voice cloning from reference clips. The current voice-cloning endpoint can rapidly reproduce a target timbre from uploaded mono or stereo audio.
Cost-sensitive prototyping. Subscription, token-plan, and pay-as-you-go routes let teams choose predictable monthly credits or usage billing.
Agent/MCP voice workflows. MiniMax provides official MCP server implementations for Python and JavaScript with speech/voice-cloning support.

When to pick something else

Peak-quality audiobook and luxury narration: ElevenLabs. MiniMax may be cheaper, but ElevenLabs still has the creator polish, marketplace, and workflow maturity advantage.
Curated community voice library: ElevenLabs and Cartesia have thousands of community-contributed voices. MiniMax’s 300+ is a narrower catalog.
Lowest-latency streaming for voice agents: Cartesia is tuned for this. MiniMax streams well, but Cartesia leads.
Offline or self-hosted requirement: Kokoro at Apache 2.0 runs locally. MiniMax Speech is hosted only.
Western vendor compliance posture: ElevenLabs, Cartesia, or Azure Speech. MiniMax is China-based by default.

Pricing

Model / Plan	Price	Notes
Pay-as-you-go T2A Turbo	$60/M characters	Applies to speech-2.8-turbo, speech-2.6-turbo, and speech-02-turbo
Pay-as-you-go T2A HD	$100/M characters	Applies to speech-2.8-hd, speech-2.6-hd, and speech-02-hd
Rapid voice cloning	$1.50 per voice	Fee is charged on first T2A use of the cloned voice, not preview
Voice design	$3 per voice	Prompt-generated voice design
Starter sub	$5/mo	100,000 credits
Standard sub	$30/mo	300,000 credits
Pro sub	$99/mo	1,100,000 credits
Scale sub	$249/mo	3,300,000 credits
Business sub	$999/mo	20,000,000 credits

Prices verified 2026-06-12 via the MiniMax Audio Subscription docs and MiniMax pay-as-you-go pricing. Do not mix up Audio Subscription, Token Plan, and pay-as-you-go: they are different purchase routes with different limits.

Against the alternatives

	MiniMax Speech 2.8 HD	ElevenLabs v3	Cartesia Sonic	Kokoro
List usage price	$100/M chars for HD; $60/M chars for Turbo	Higher, plan/credit dependent	Usage-based	Free (self-host)
Languages	40	32+	15+	9
Voice cloning	3-10s zero-shot	Best-in-class	Yes	No
Cross-lingual cloning	Yes	Yes	Limited	N/A
Real-time streaming	Yes	Yes	Strongest	No
Quality ceiling	High	Highest	High	Mid (narration-grade)
Voice library breadth	300+	3,000+	Large	26 (v1.0)
Best viewed as	Cheapest hosted multilingual	Premium hosted	Streaming specialist	Offline-first

Failure modes

Quality ceiling and workflow maturity below ElevenLabs on critical creator work. MiniMax is strong on API economics, but ElevenLabs remains the safer default for polished creator narration, voice marketplace breadth, and non-developer production workflows.
Voice library is narrower. 300+ voices against ElevenLabs’ thousands. Specific demographic or style gaps can force workarounds.
Voice-clone lifecycle can surprise teams. Rapid clones are temporary unless used in T2A within 168 hours, and fees are charged when the clone is first synthesized through T2A.
Ecosystem is thinner. Fewer SDKs, integrations, and community tutorials compared to ElevenLabs or Cartesia as of June 12, 2026.
Peak-load latency spikes. Some reviews note occasional processing delays under heavy load. Base latency is competitive.
China-based vendor. Enterprise compliance teams with US or EU data-residency requirements should use the private deployment option or choose a Western vendor.
Model naming and plan surfaces are easy to confuse. Speech 2.8, Speech 2.6, and Speech-02 appear across different docs; Audio Subscription, Token Plan, and pay-as-you-go are separate purchase routes.
Accent drift on non-native cloned voices. Cloning an English speaker into Mandarin output preserves timbre but can drift on native accent nuances.

Methodology

This page was rechecked by the aipedia.wiki editorial workflow on June 12, 2026 against the MiniMax T2A API overview, MiniMax T2A HTTP docs, MiniMax Voice Cloning docs, MiniMax Audio Subscription pricing, MiniMax pay-as-you-go pricing, and MiniMax’s March 2026 financial-results release. Scoring follows the four-dimension rubric at /about/scoring/ (Utility × Value × Moat × Longevity, unweighted average).

FAQ

How does MiniMax Speech pricing compare to ElevenLabs? MiniMax is positioned as the cheaper developer API lane. As of June 12, 2026, MiniMax pay-as-you-go lists T2A Turbo at $60 per million characters and HD at $100 per million characters, plus monthly Audio Subscription plans from $5 to $999. ElevenLabs retains a broader voice library, richer integrations, and a higher quality ceiling for premium production.

What is the difference between Speech 2.8 HD and Speech 2.8 Turbo? HD is the fidelity lane for voiceovers, audiobook-style narration, and polished output. Turbo is the speed/value lane for live apps, chatbots, gaming, IVR, and high-volume generation.

Does MiniMax Speech have a free tier? MiniMax has several purchase paths rather than one simple free tier. Token Plan pages include Speech 2.8 daily character allowances on Plus/Max plans, while Audio Subscription and pay-as-you-go are separate. Check the exact purchase route before assuming credits carry across products.

What languages does MiniMax Speech cover? The current API docs expose language boost options across Chinese, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, and auto-detection.

Can I clone a voice across languages? Yes, but treat it as a consent-sensitive production feature. The current voice-cloning API can rapidly reproduce a target timbre from uploaded reference audio, but clones are temporary unless used in T2A within 168 hours and should only be used with proper rights and consent.

Sources

MiniMax T2A API overview: current models, T2A features, async generation, and voice cloning overview
MiniMax T2A HTTP docs: supported model IDs, formats, streaming, and language boost options
MiniMax Voice Cloning docs: rapid cloning lifecycle and supported models
MiniMax Audio Subscription pricing: subscription tier rates, credits, voice slots, and RPM
MiniMax pay-as-you-go pricing: per-character T2A pricing, voice cloning, and voice design fees
MiniMax FY2025 results: corporate/source context for Speech 2.6 and voice usage scale

Category: AI Voice
Parent company: MiniMax
Compare: ElevenLabs · Cartesia · Kokoro

Reader reviews

Loading…

Share LinkedIn

Was this review helpful?

Embed this score on your site Free. Links back.

HTML

<a href="https://aipedia.wiki/tools/minimax-speech/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/minimax-speech.svg" alt="MiniMax Speech on aipedia.wiki" width="260" height="72" /></a>

Markdown

[![MiniMax Speech on aipedia.wiki](https://aipedia.wiki/badges/minimax-speech.svg)](https://aipedia.wiki/tools/minimax-speech/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers

News writers

According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/minimax-speech/)

APA

aipedia.wiki Editorial. (2026). MiniMax Speech: Editorial Review. aipedia.wiki. Retrieved June 22, 2026, from https://aipedia.wiki/tools/minimax-speech/

MLA 9

aipedia.wiki Editorial. "MiniMax Speech: Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/minimax-speech/. Accessed June 22, 2026.

Chicago

aipedia.wiki Editorial. 2026. "MiniMax Speech: Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/minimax-speech/.

BibTeX

@misc{minimax-speech-editorial-review-2026,
  author = {{aipedia.wiki Editorial}},
  title = {MiniMax Speech: Editorial Review},
  year = {2026},
  publisher = {aipedia.wiki},
  url = {https://aipedia.wiki/tools/minimax-speech/},
  note = {Accessed: 2026-06-22}
}

Spotted an error or want to share your experience with MiniMax Speech?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used MiniMax Speech and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki

Report outdated info Help us keep this page accurate

$5-$999/mo subscriptions / $60-$100 per 1M chars PAYG

The call

Why this recommendation is trusted

Key facts

System Verdict

Key Facts

What it actually is

When to pick MiniMax Speech

When to pick something else

Pricing

Against the alternatives

Failure modes

Methodology

FAQ

Sources

Related

Reader reviews