The text-to-speech and voice-cloning product line from MiniMax, the Shanghai AI lab. The current API docs now put speech-2.8-hd and speech-2.8-turbo at the front of the model list, while Speech 2.6 and Speech-02 remain supported legacy/current-compatibility routes.
The current docs list 300+ system voices plus custom cloned voices, streaming output, MP3/WAV/FLAC/PCM-style audio support across endpoints, synchronous requests up to 10,000 characters, and async long-form generation up to 1 million characters per task.
System Verdict
Pick MiniMax Speech if the brief is multilingual TTS at production volume where API economics drive the budget. As of June 12, 2026 the pay-as-you-go page lists T2A Turbo at $60 per million characters and T2A HD at $100 per million characters, while Audio Subscription plans start at $5/month for 100,000 credits and scale to $999/month for 20,000,000 credits.
Skip it for peak-quality audiobook and luxury production work. ElevenLabs still holds the quality ceiling, the larger curated voice marketplace, and the deeper third-party integration stack. Cartesia owns low-latency guarantees.
The naming drift matters. Current docs list Speech 2.8 as latest, while Speech 2.6 and Speech-02 remain visible in API references, pay-as-you-go pricing, token plans, and third-party mirrors. Integration requires checking the exact endpoint and plan, not just the model family name.
Key Facts
| Vendor | MiniMax (Shanghai, HKEX-listed) |
| Current API models | speech-2.8-hd · speech-2.8-turbo |
| Supported older speech models | speech-2.6-hd · speech-2.6-turbo · speech-02-hd · speech-02-turbo · speech-01-hd · speech-01-turbo |
| Pay-as-you-go T2A price | Turbo $60/M characters · HD $100/M characters |
| Audio Subscription entry | Starter $5/mo · 100,000 credits/mo |
| System voices | 300+ plus custom cloned voices |
| Voice cloning | Rapid cloning from uploaded mono/stereo reference audio; clone is temporary unless used in T2A within 168 hours |
| Long-form async | Up to 1 million characters per async task |
| Streaming | Supported through HTTP/WebSocket T2A endpoints |
| Output formats | MP3, WAV, FLAC, PCM depending on endpoint and streaming mode |
| Official MCP | Python and JavaScript MCP server implementations with voice cloning support |
What it actually is
A hosted TTS API with synchronous T2A, WebSocket T2A, async long-form T2A, voice cloning, voice design, and voice management. Turbo is the cost/speed lane. HD is the fidelity lane. Speech 2.8 is the latest named model family in the current API docs.
Voice cloning now matters as a workflow and governance question, not just a feature bullet. The current API intro says rapid clones are temporary unless used in speech synthesis within 168 hours, and the fee is charged the first time the cloned voice is used in T2A synthesis.
Speed, pitch, volume, bitrate, sample rate, language boost, subtitle output, voice effects, and streaming settings are exposed through the API. Sync endpoints handle up to 10,000 characters per request; async long-form generation handles up to 1 million characters.
When to pick MiniMax Speech
- Scaling multilingual IVR, chatbots, or conversational AI. Turbo at $60 per million characters supports high-volume voice agents economically when the team can integrate directly.
- Multilingual content pipelines. One vendor for 40 languages avoids per-market vendor sprawl.
- Voice cloning from reference clips. The current voice-cloning endpoint can rapidly reproduce a target timbre from uploaded mono or stereo audio.
- Cost-sensitive prototyping. Subscription, token-plan, and pay-as-you-go routes let teams choose predictable monthly credits or usage billing.
- Agent/MCP voice workflows. MiniMax provides official MCP server implementations for Python and JavaScript with speech/voice-cloning support.
When to pick something else
- Peak-quality audiobook and luxury narration: ElevenLabs. MiniMax may be cheaper, but ElevenLabs still has the creator polish, marketplace, and workflow maturity advantage.
- Curated community voice library: ElevenLabs and Cartesia have thousands of community-contributed voices. MiniMax’s 300+ is a narrower catalog.
- Lowest-latency streaming for voice agents: Cartesia is tuned for this. MiniMax streams well, but Cartesia leads.
- Offline or self-hosted requirement: Kokoro at Apache 2.0 runs locally. MiniMax Speech is hosted only.
- Western vendor compliance posture: ElevenLabs, Cartesia, or Azure Speech. MiniMax is China-based by default.
Pricing
| Model / Plan | Price | Notes |
|---|---|---|
| Pay-as-you-go T2A Turbo | $60/M characters | Applies to speech-2.8-turbo, speech-2.6-turbo, and speech-02-turbo |
| Pay-as-you-go T2A HD | $100/M characters | Applies to speech-2.8-hd, speech-2.6-hd, and speech-02-hd |
| Rapid voice cloning | $1.50 per voice | Fee is charged on first T2A use of the cloned voice, not preview |
| Voice design | $3 per voice | Prompt-generated voice design |
| Starter sub | $5/mo | 100,000 credits |
| Standard sub | $30/mo | 300,000 credits |
| Pro sub | $99/mo | 1,100,000 credits |
| Scale sub | $249/mo | 3,300,000 credits |
| Business sub | $999/mo | 20,000,000 credits |
Prices verified 2026-06-12 via the MiniMax Audio Subscription docs and MiniMax pay-as-you-go pricing. Do not mix up Audio Subscription, Token Plan, and pay-as-you-go: they are different purchase routes with different limits.
Against the alternatives
| MiniMax Speech 2.8 HD | ElevenLabs v3 | Cartesia Sonic | Kokoro | |
|---|---|---|---|---|
| List usage price | $100/M chars for HD; $60/M chars for Turbo | Higher, plan/credit dependent | Usage-based | Free (self-host) |
| Languages | 40 | 32+ | 15+ | 9 |
| Voice cloning | 3-10s zero-shot | Best-in-class | Yes | No |
| Cross-lingual cloning | Yes | Yes | Limited | N/A |
| Real-time streaming | Yes | Yes | Strongest | No |
| Quality ceiling | High | Highest | High | Mid (narration-grade) |
| Voice library breadth | 300+ | 3,000+ | Large | 26 (v1.0) |
| Best viewed as | Cheapest hosted multilingual | Premium hosted | Streaming specialist | Offline-first |
Failure modes
- Quality ceiling and workflow maturity below ElevenLabs on critical creator work. MiniMax is strong on API economics, but ElevenLabs remains the safer default for polished creator narration, voice marketplace breadth, and non-developer production workflows.
- Voice library is narrower. 300+ voices against ElevenLabs’ thousands. Specific demographic or style gaps can force workarounds.
- Voice-clone lifecycle can surprise teams. Rapid clones are temporary unless used in T2A within 168 hours, and fees are charged when the clone is first synthesized through T2A.
- Ecosystem is thinner. Fewer SDKs, integrations, and community tutorials compared to ElevenLabs or Cartesia as of June 12, 2026.
- Peak-load latency spikes. Some reviews note occasional processing delays under heavy load. Base latency is competitive.
- China-based vendor. Enterprise compliance teams with US or EU data-residency requirements should use the private deployment option or choose a Western vendor.
- Model naming and plan surfaces are easy to confuse. Speech 2.8, Speech 2.6, and Speech-02 appear across different docs; Audio Subscription, Token Plan, and pay-as-you-go are separate purchase routes.
- Accent drift on non-native cloned voices. Cloning an English speaker into Mandarin output preserves timbre but can drift on native accent nuances.
Methodology
This page was rechecked by the aipedia.wiki editorial workflow on June 12, 2026 against the MiniMax T2A API overview, MiniMax T2A HTTP docs, MiniMax Voice Cloning docs, MiniMax Audio Subscription pricing, MiniMax pay-as-you-go pricing, and MiniMax’s March 2026 financial-results release. Scoring follows the four-dimension rubric at /about/scoring/ (Utility × Value × Moat × Longevity, unweighted average).
FAQ
How does MiniMax Speech pricing compare to ElevenLabs? MiniMax is positioned as the cheaper developer API lane. As of June 12, 2026, MiniMax pay-as-you-go lists T2A Turbo at $60 per million characters and HD at $100 per million characters, plus monthly Audio Subscription plans from $5 to $999. ElevenLabs retains a broader voice library, richer integrations, and a higher quality ceiling for premium production.
What is the difference between Speech 2.8 HD and Speech 2.8 Turbo? HD is the fidelity lane for voiceovers, audiobook-style narration, and polished output. Turbo is the speed/value lane for live apps, chatbots, gaming, IVR, and high-volume generation.
Does MiniMax Speech have a free tier? MiniMax has several purchase paths rather than one simple free tier. Token Plan pages include Speech 2.8 daily character allowances on Plus/Max plans, while Audio Subscription and pay-as-you-go are separate. Check the exact purchase route before assuming credits carry across products.
What languages does MiniMax Speech cover? The current API docs expose language boost options across Chinese, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, and auto-detection.
Can I clone a voice across languages? Yes, but treat it as a consent-sensitive production feature. The current voice-cloning API can rapidly reproduce a target timbre from uploaded reference audio, but clones are temporary unless used in T2A within 168 hours and should only be used with proper rights and consent.
Sources
- MiniMax T2A API overview: current models, T2A features, async generation, and voice cloning overview
- MiniMax T2A HTTP docs: supported model IDs, formats, streaming, and language boost options
- MiniMax Voice Cloning docs: rapid cloning lifecycle and supported models
- MiniMax Audio Subscription pricing: subscription tier rates, credits, voice slots, and RPM
- MiniMax pay-as-you-go pricing: per-character T2A pricing, voice cloning, and voice design fees
- MiniMax FY2025 results: corporate/source context for Speech 2.6 and voice usage scale
Related
- Category: AI Voice
- Parent company: MiniMax
- Compare: ElevenLabs · Cartesia · Kokoro