Voxtral

Voxtral is no longer an STT-only buyer...

8/10 Strong

Active

Monthly Open weights for eligible use Annual hosted TTS $0.016/1k chars Price Transcribe 2 from $0.002/min

Best plan

Open weights for eligible use

Risk: Voxtral TTS open weights are published under CC BY-NC 4

Try Voxtral free

Editorial · no paid placements

Should you use it?

Voxtral is no longer an STT-only buyer note. Mistral's current audio lineup includes Voxtral TTS v26.03 for speech generation, Voxtral Mini Transcribe 2 for batch STT, and Voxtral Realtime for live transcription/audio understanding. Pick it when you want a Mistral-native audio stack, low published hosted TTS pricing, or open-model experimentation. Skip it for a creator-first studio like ElevenLabs, a managed low-latency agent stack like Cartesia, or governed enterprise dubbing like Resemble AI.

Buy if Teams already building on Mistral AI and La Plateforme
Pick Open weights for eligible use; hosted TTS $0.016/1k chars; Transcribe 2 from $0.002/min
Skip if Creators who want a polished voiceover studio

Plan guidance

What to buy

Best plan Open weights for eligible use; hosted TTS $0.016/1k chars; Transcribe 2 from $0.002/min

Watch: Voxtral TTS open weights are published under CC BY-NC 4

Price range Open weights for eligible use; hosted TTS $0.016/1k chars; Transcribe 2 from $0.002/min

$0.016/1k chars; from $0.002/min

Upgrade only if Not for creators who want a polished voiceover studio

Voxtral TTS open weights are published under CC BY-NC 4

Current pricing source: Mistral Voxtral TTS announcement

Fit

Use it for this, skip it for that

Best for

Teams already building on Mistral AI and La Plateforme
Developers who need both TTS and speech-to-text in one model ecosystem
Hosted TTS experiments where $0.016 per 1k characters is attractive
Transcription and realtime speech understanding workloads

Avoid if

Creators who want a polished voiceover studio
Teams that need commercial use of non-commercial open weights without using the hosted API
Enterprise dubbing pipelines with lip-sync, review, watermarking, and approval workflows
Voice-agent teams that have not benchmarked realtime latency end to end

Watch out: Voxtral TTS open weights are published under CC BY-NC 4.0, so commercial teams should use the hosted API or confirm license terms. It is not a polished creator studio, and production voice-agent buyers should benchmark latency against Cartesia, ElevenLabs, and Deepgram on real traffic.

Recent changes

Only what affects the decision

Jun 26, 2026
Voxtral TTS / Mini Transcribe 2
Official Mistral TTS announcement still lists Voxtral TTS at $0.016 per 1k characters and open weights under CC BY-NC 4.0; speech-to-text docs still list Voxtral Mini Transcribe 2 and...
Mistral Voxtral TTS announcement
Jun 5, 2026
Voxtral TTS / Mini Transcribe 2
Comparison refresh rechecked Voxtral against Descript and removed stale/fake creator-app claims. Voxtral remains a Mistral audio model family, not a Descript-style editor
Mistral AI pricing
Jun 3, 2026
Voxtral TTS v26.03
Mistral pricing now lists Voxtral TTS v26.03 as an audio generation model with $0 input and $16/M output characters, equivalent to $0.016 per 1k characters
Mistral AI pricing

Alternatives

Best swaps

ElevenLabs

The top-ranked AI voice platform in June 2026. Eleven v3 covers 70+ languages with expressive audio tags, Flash v2.5 hits ~75ms

$0-$990/month · 9.3/10 Whisper

OpenAI's open-weights speech-to-text baseline. MIT-licensed code and weights remain useful for self-hosted batch transcription,

Free self-host / OpenAI transcription API $0.003-$0.006 per minute; GPT-Realtime-Whisper $0.017 per minute · 9/10 Cartesia

Real-time voice stack for agents. Sonic-3.5 TTS and Ink-2 STT now form the default Line model pair for eligible voice agents, wi

$0-$239/month + credits · 8.5/10

Build comparison

Voxtral comparisons

See all →

Fish Audio / OpenAudio S1 + S2 vs Voxtral

June 2026 head-to-head of Fish Audio and Voxtral. Compare open-weight TTS, Mistral Voxtral TTS, transcription, realtime STT, pricing, and stack fit.

Proof and score math Verified Jun 26

Proof

Why this recommendation is trusted

Evidence Mistral Voxtral TTS announcement

Source: Registered source
Freshness: Current
Confidence: High confidence
Verified: Jun 26, 2026
Review: Jul 5, 2026
Volatility: Volatile

High-volatility evidence needs frequent review.

Editorial score

Unweighted average of 4 axes · confidence high

Utility 8/10

How much real work it can do for a competent operator, end to end.
Value 9/10

What you get for the dollar relative to the closest alternative.
Moat 7/10

How hard it would be for a competitor to replicate the underlying advantage.
Longevity 8/10

How likely the product is to still be best-in-class 24 months out.

Verified facts

Best For Teams already using Mistral that want speech generation, transcription, realtime audio understanding, lower hosted TTS unit pricing, or open-model experimentation in the same ecosystem.
high Drifts 2026-06-26 Mistral Voxtral TTS announcement
Pricing Anchor Mistral pricing lists Voxtral TTS v26.03 at $0.016 per 1k characters and Voxtral Mini Transcribe 2 from $0.002 per minute; check Mistral pricing before production rollout because audio model pricing is volatile.
high Volatile 2026-06-26 Mistral AI pricing
Watch Out For Voxtral TTS open weights are published under CC BY-NC 4.0, so commercial teams should use the hosted API or confirm license terms. It is not a polished creator studio, and production voice-agent buyers should benchmark latency against Cartesia, ElevenLabs, and Deepgram on real traffic.
high Volatile 2026-06-26 Mistral Voxtral TTS announcement
Api Available Yes. Mistral exposes hosted Voxtral TTS and transcription APIs through La Plateforme; Voxtral Mini Transcribe 2 and Voxtral Realtime are documented in the speech-to-text API docs.
high Volatile 2026-06-26 Mistral audio docs
Model Family Voxtral is now Mistral AI's broader audio family: Voxtral TTS v26.03 for speech generation, Voxtral Mini Transcribe 2 for batch transcription, and Voxtral Realtime for live speech-to-text/audio understanding.
high Volatile 2026-06-26 Voxtral TTS model card

Full review notes Long-form details, FAQ, and source history

Voxtral is Mistral AI’s audio family. It now spans text-to-speech, speech-to-text, and audio understanding inside the Mistral ecosystem.

The June 2026 correction is important: older AiPedia copy treated Voxtral as STT-only. That was accurate for the original July 2025 launch and the May 2026 Transcribe/Realtime docs, but it is no longer enough. Mistral now publishes Voxtral TTS v26.03 as a text-to-speech model, alongside Voxtral Mini Transcribe 2 and Voxtral Realtime for transcription and live audio understanding.

System Verdict

Pick Voxtral if you already build on Mistral and want audio in the same stack. The hosted API now covers speech generation and transcription, while the open-model posture makes Voxtral more interesting for experimentation than closed voice platforms.

Skip it if you want the easiest creator workflow. ElevenLabs is still the safer default for polished narration, cloning, dubbing, and non-developer voice work. Cartesia remains cleaner for managed, low-latency voice agents. Resemble AI is stronger when localization, watermarking, approval workflow, and deepfake detection are procurement requirements.

Who pays which path: for commercial hosted production. Use open weights v26.03 at $0.016 per 1k characters and Mini Transcribe 2 from $0.002/min, but production buyers should re-check the pricing page before rollout.

Key Facts


Family	Voxtral by Mistral AI
TTS model	Voxtral TTS v26.03
STT models	Voxtral Mini Transcribe 2 and Voxtral Realtime
TTS pricing anchor	$0.016 per 1k characters on Mistral pricing
Transcription pricing anchor	Mini Transcribe 2 from $0.002/min on Mistral pricing
Open TTS weights	Announced under CC BY-NC 4.0
Realtime STT	Voxtral Realtime for live transcription/audio understanding
Hosted path	La Plateforme API
Best fit	Mistral-native speech generation, transcription, and audio understanding
Not a full creator studio	No ElevenLabs-style voice library, Studio workflow, or dubbing UI

Every data point above was verified against vendor sources on 2026-06-26. See Sources.

What It Actually Is

Voxtral is an audio model family, not a consumer voiceover app. The hosted API is the practical commercial path for most teams. The open-weight angle is the strategic draw for researchers, local testers, and teams that need to evaluate model behavior more deeply before committing.

Voxtral TTS v26.03 turns text into speech. Mistral’s model card positions it as a text-to-speech model and the pricing page lists the hosted rate at $16 per million output characters. Voxtral Mini Transcribe 2 and Voxtral Realtime turn audio into text and structured understanding, covering the input side of a voice-agent loop.

The stronger buyer case is ecosystem consolidation: one Mistral account, one model provider, one API surface for text models plus audio generation and transcription.

When To Pick Voxtral

You already use Mistral. It keeps text, TTS, STT, and audio understanding closer to the same provider and billing system.
Hosted TTS unit price matters. $0.016 per 1k characters is a strong published rate for API-based TTS experiments.
You need STT and TTS together. Voxtral covers both sides of the voice loop, though production teams should still benchmark or STT vendor.
You want open-model experimentation. Voxtral’s open weights are useful for research, evaluation, and architecture exploration.
You are building audio understanding. Transcribe 2 and Realtime are a better fit than creator TTS tools when the primary job is understanding incoming speech.

When To Pick Something Else

Creator narration and voice cloning: ElevenLabs is still the cleaner default for voice library, cloning, dubbing, Studio, and commercial creator workflow.
Managed low-latency agents: Cartesia has the stronger voice-agent product surface with Sonic, Ink-Whisper, and Line.
Open-weight expressive TTS: Fish Audio remains a better first test when the question is self-hosted expressive speech quality.
Enterprise voice governance: Resemble AI adds watermarking, deepfake detection, localization, deployment options, and approval workflow.
Meeting transcription app: Otter, Fireflies, Fathom, and MeetGeek are user-facing meeting products; Voxtral is an API/model family.

Pricing

Access	Cost	Notes
Voxtral TTS v26.03 API	$0.016 per 1k characters	Mistral pricing lists $0 input and $16/M output characters
Voxtral Mini Transcribe 2 API	From $0.002/min	Current transcription anchor on Mistral pricing
Voxtral Realtime	Check current Mistral pricing/docs	Live transcription/audio understanding path
Open TTS weights	Free for eligible use	Mistral announcement says CC BY-NC 4.0; commercial teams need hosted API or license review
Enterprise	Custom	Use Mistral procurement for volume, private deployment, or commercial-license questions

Prices verified 2026-06-26 via Mistral pricing, Voxtral TTS model card, Mistral Voxtral TTS announcement, and Mistral speech-to-text docs.

Against The Alternatives

	Voxtral	Cartesia	ElevenLabs	Fish Audio
Best viewed as	Mistral audio model family	Managed voice-agent stack	Creator and platform voice suite	Open-weight expressive TTS
TTS	Yes, Voxtral TTS v26.03	Yes, Sonic	Yes, Eleven v3/Flash	Yes, OpenAudio S1/S2 Pro
STT	Yes, Transcribe 2/Realtime	Yes, Ink-Whisper	Yes, Scribe	Yes, transcribe-1
Open weights	Yes, with license caveats	No	No	Yes, MIT
Creator UI	Limited	Limited	Strong	Basic
Voice-agent posture	Useful input/output model stack	Strongest managed agent path	Broad platform with agents	Model/deployment path
Best buyer	Mistral-stack developers	Agent builders	Creators and audio teams	Open/self-host teams

Failure Modes

License mismatch risk. Voxtral TTS open weights are CC BY-NC 4.0. Commercial buyers should use the hosted API or secure the right license instead of assuming open equals production-safe.
Not a creator studio. There is no ElevenLabs-style creator workflow, large voice library, Studio surface, or polished dubbing pipeline.
Realtime still needs testing. Do not assume a model card or vendor claim means the full phone path is fast enough. Test mic input, STT, LLM together.
Language and voice quality vary. Evaluate your own scripts, languages, accents, reference voices, and target device speakers.
Young product surface. Community wrappers, benchmarks, and production playbooks are thinner than Whisper, Deepgram, ElevenLabs, or Cartesia.
Pricing can move. Audio pricing is volatile. Re-check Mistral pricing before committing budget or publishing public claims.

Recent Changes

2026-06-05: Descript comparison refresh removed stale/fake claims from the comparison context and reconfirmed Voxtral as Mistral’s audio model family for TTS, Mini Transcribe 2, and Realtime rather than a creator editing app.
2026-06-03: Page corrected from the older STT-only framing. Voxtral now covers TTS via Voxtral TTS v26.03 as well as STT via Voxtral Mini Transcribe 2 and Realtime.
2026-06-03: Mistral pricing check added hosted TTS at $0.016 per 1k characters and Mini Transcribe 2 from $0.002/min.
2026-05-13: AiPedia previously corrected Voxtral away from a fake TTS-vs-TTS framing and toward STT-only. That correction is now superseded by Mistral’s newer Voxtral TTS release.
2025-07-15: Original Voxtral launch positioned the family around speech understanding, transcription, and open weights.

Methodology

This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies pricing and model details against primary sources, and generates the editorial analysis you are reading. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility, Value, Moat, Longevity; unweighted average). Last verified 2026-06-26 against Mistral pricing, Voxtral TTS model card, Mistral Voxtral TTS announcement, Mistral speech-to-text docs, and Mistral changelog.

FAQ

Does Voxtral do text-to-speech now? Yes. Mistral now publishes Voxtral TTS v26.03 as a text-to-speech model. Older pages that described Voxtral as STT-only are stale.

Is Voxtral free? There are open weights for eligible use, but license terms matter. Voxtral TTS open weights are announced under CC BY-NC 4.0, so commercial teams should use the hosted API or confirm licensing. Hosted API use is paid.

How much does Voxtral TTS cost? Mistral pricing lists Voxtral TTS v26.03 at $16 per million output characters, equivalent to $0.016 per 1k characters, verified 2026-06-26.

Does Voxtral handle transcription? Yes. Voxtral Mini Transcribe 2 handles batch transcription and Voxtral Realtime handles live transcription/audio understanding.

How does Voxtral compare to Cartesia? Cartesia is stronger for managed, low-latency production voice agents. Voxtral is stronger when the team wants Mistral-native TTS/STT, lower hosted TTS unit pricing, and open-model experimentation.

Sources

Mistral pricing: Voxtral TTS and Mini Transcribe 2 pricing anchors
Voxtral TTS model card: model identifier, TTS positioning, hosted model details
Mistral Voxtral TTS announcement: open weights and release context
Mistral speech-to-text docs: Voxtral Mini Transcribe 2 and Realtime STT API docs
Mistral model overview: current model lineup
Original Voxtral launch: original speech-understanding launch context

Category: AI Voice / Speech
Comparisons: Fish Audio vs Voxtral

Share LinkedIn

Was this review helpful?

Embed this score on your site Free. Links back.

HTML

<a href="https://aipedia.wiki/tools/voxtral/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/voxtral.svg" alt="Voxtral on aipedia.wiki" width="260" height="72" /></a>

Markdown

[![Voxtral on aipedia.wiki](https://aipedia.wiki/badges/voxtral.svg)](https://aipedia.wiki/tools/voxtral/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers

News writers

According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/voxtral/)

APA

aipedia.wiki Editorial. (2026). Voxtral: Editorial Review. aipedia.wiki. Retrieved July 2, 2026, from https://aipedia.wiki/tools/voxtral/

MLA 9

aipedia.wiki Editorial. "Voxtral: Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/voxtral/. Accessed July 2, 2026.

Chicago

aipedia.wiki Editorial. 2026. "Voxtral: Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/voxtral/.

BibTeX

@misc{voxtral-editorial-review-2026,
  author = {{aipedia.wiki Editorial}},
  title = {Voxtral: Editorial Review},
  year = {2026},
  publisher = {aipedia.wiki},
  url = {https://aipedia.wiki/tools/voxtral/},
  note = {Accessed: 2026-07-02}
}

Spotted an error or want to share your experience with Voxtral?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Voxtral and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki

Report outdated info Help us keep this page accurate