Fish Audio / OpenAudio S1 + S2

Fish Audio is the open-weight TTS leader heading into June...

8.5/10 Strong

Active

$0-$75/month

Best plan

$0-$75/month

Risk: Voice cloning and synthetic speech create consent...

Try Fish Audio / OpenAudio S1 + S2 free

Editorial · no paid placements

Should you use it?

Fish Audio is the open-weight TTS leader heading into June 2026. OpenAudio S2 Pro ranked first in Fish Audio's blind preference testing, S1 remains the fast default, and Fish announced S2.1 Pro API access on June 23, 2026. Pick it for self-hosted voice agents or high-volume API workloads; skip for enterprise dubbing (use Resemble) or real-time voice agents where Cartesia's sub-100ms latency matters more than voice quality.

Buy if Open-source TTS with self-hosting
Pick $0-$75/month
Skip if Teams wanting a polished consumer UI

Plan guidance

What to buy

Best plan $0-$75/month

Watch: Voice cloning and synthetic speech create consent...

Price range $0-$75/month

Free developer API access announced

Upgrade only if Not for teams wanting a polished consumer ui

Voice cloning and synthetic speech create consent...

Current pricing source: Fish Audio S2.1 Pro API announcement

Fit

Use it for this, skip it for that

Best for

Open-source TTS with self-hosting
Expressive narration and character voices
Multilingual output across 80+ languages
High-volume API workloads at low cost

Avoid if

Teams wanting a polished consumer UI
Enterprise dubbing pipelines with lip-sync
Workflows that need built-in deepfake detection

Watch out: Voice cloning and synthetic speech create consent, rights, and disclosure risk. Confirm licensing and speaker authorization before publishing generated voices.

Recent changes

Only what affects the decision

Jun 25, 2026
S2.1 Pro API
Fish Audio's June 23, 2026 announcement says S2.1 Pro is open to developers through free API access; production buyers should still verify account limits and current API pricing docs
Fish Audio S2.1 Pro API announcement
Jun 23, 2026
Plus / Pro / API
June 23 refresh rechecked the plan page and API pricing...
Fish Audio plans and pricing
Jun 2, 2026
Pro
Verified Pro at $75/mo, 1,620 minutes (~27 hours). Max tier $749/mo confirmed at 6,250 minutes for sustained agency workloads. API docs list s1 and s2-pro at $15 per 1M UTF-8 bytes
Fish Audio plans and pricing

Alternatives

Best swaps

ElevenLabs

The top-ranked AI voice platform in June 2026. Eleven v3 covers 70+ languages with expressive audio tags, Flash v2.5 hits ~75ms

$0-$990/month · 9.3/10 Whisper

OpenAI's open-weights speech-to-text baseline. MIT-licensed code and weights remain useful for self-hosted batch transcription,

Free self-host / OpenAI transcription API $0.003-$0.006 per minute; GPT-Realtime-Whisper $0.017 per minute · 9/10 Cartesia

Real-time voice stack for agents. Sonic-3.5 TTS and Ink-2 STT now form the default Line model pair for eligible voice agents, wi

$0-$239/month + credits · 8.5/10

Build comparison

Fish Audio / OpenAudio S1 + S2 comparisons

See all →

Fish Audio / OpenAudio S1 + S2 vs Resemble AI

June 2026 head-to-head of Fish Audio / OpenAudio S1 + S2 and Resemble AI. Compare open-weight TTS control, voice cloning, governance, pricing, and enterprise fit.

Fish Audio / OpenAudio S1 + S2 vs Voxtral

June 2026 head-to-head of Fish Audio and Voxtral. Compare open-weight TTS, Mistral Voxtral TTS, transcription, realtime STT, pricing, and stack fit.

Proof and score math Verified Jun 25

Proof

Why this recommendation is trusted

Evidence Fish Audio official site

Source: Registered source
Freshness: Current
Confidence: Medium confidence
Verified: Jun 25, 2026
Review: Sep 5, 2026
Volatility: Volatile

High-volatility evidence needs frequent review.

Editorial score

Unweighted average of 4 axes · confidence high

Utility 9/10

How much real work it can do for a competent operator, end to end.
Value 10/10

What you get for the dollar relative to the closest alternative.
Moat 7/10

How hard it would be for a competitor to replicate the underlying advantage.
Longevity 8/10

How likely the product is to still be best-in-class 24 months out.

Verified facts

Best For Voice teams that want expressive text-to-speech, voice cloning, or speech generation without starting from a purely enterprise voice stack.
high Drifts 2026-06-25 Fish Audio official site
Pricing Anchor Plus $11/mo for ~200 minutes, Pro $75/mo for ~27 hours, and Max $749/mo for ~104 hours. API pricing is pay-as-you-go at $15 per 1M UTF-8 bytes for s1 and s2-pro, and ASR transcribe-1 is $0.36 per audio hour.
high Volatile 2026-06-25 Fish Audio API pricing and rate limits
Watch Out For Voice cloning and synthetic speech create consent, rights, and disclosure risk. Confirm licensing and speaker authorization before publishing generated voices.
high Drifts 2026-06-25 Fish Audio official site
Model Surface Fish Audio is useful when the evaluation is voice quality and expressive generation, not only transcript accuracy or call-center automation.
medium Volatile 2026-06-25 Fish Audio official site
Workflow Surface Best evaluated with your own scripts, languages, and target delivery channel because voice quality varies by speaker, emotion, and post-processing needs.
medium Drifts 2026-06-25 Fish Audio official site

Open Source Recent activity

fishaudio/fish-speech

3 weeks agolast commit

Full review notes Long-form details, FAQ, and source history

Fish Audio ships two current models. OpenAudio S1 is the fast default; S2 Pro is the expressive flagship trained on 10M+ hours across 80+ languages. Both are MIT-licensed for self-hosting and available via the Fish Audio cloud.

S2 Pro ranked first in Fish Audio’s own 2026 blind-provider comparison. Artificial Analysis benchmarks agree: Fish Audio’s family currently leads on aggregate TTS quality ELO.

System Verdict

Pick Fish Audio if you need top-tier TTS quality without ElevenLabs pricing. S2 Pro is the strongest open-weight model in 2026, and self-hosting on a consumer GPU eliminates recurring cost entirely. S1 covers the fast, low-latency default; S2 Pro covers expressive narration and character work.

Skip it if the workflow is enterprise dubbing with lip-sync (use Resemble AI), if sub-100ms streaming is the hard constraint (use Cartesia), or if a no-code consumer UI matters more than raw quality (Speechify for reading, ElevenLabs for creator workflows).

Who pays which tier: Free for testing (7 minutes S2). Plus $11/mo for creators running ~200 minutes. Pro $75/mo for sustained 27-hour workloads. Max $749/mo for agency-scale 104-hour workloads. API at $15 per 1M UTF-8 bytes for developers. Self-hosters pay only GPU cost.

Key Facts


Flagship model	OpenAudio S2 Pro (dual-autoregressive, RL-aligned)
Fast model	OpenAudio S1
Language coverage	80+ languages including English, Chinese, Japanese, Korean, Spanish, French, German, Arabic, Hindi
License	MIT for open weights; weights on GitHub and Hugging Face
Self-hosting	Consumer GPUs with 8GB+ VRAM
Cloud free tier	7 minutes S2 generation, 8K credits/mo
Cloud Plus	$11/mo, ~200 minutes S2, 250K credits
Cloud Pro	$75/mo, ~1,620 minutes (27 hours) S2, 2M credits, 3 team seats
Cloud Max	$749/mo, ~6,250 minutes (104 hours) S2, 25M credits, 10 team seats
API pricing	$15 per 1M UTF-8 bytes for s1 and s2-pro · transcribe-1 ASR at $0.36/audio hour
Blind-test rank (2026)	S2 Pro #1, S1 above every third-party provider

Every data point above was verified against vendor sources on 2026-06-25. See Sources.

What it actually is

A single TTS stack served three ways. Weights on Hugging Face for self-hosters, a cloud platform at fish.audio for creators, and a REST API for developers.

S1 handles the default case: fast, low-latency, acceptable quality for most production voice work. S2 Pro handles expressive narration, multilingual output, and character voices where naturalness matters.

The moat is model quality plus license freedom. S2 Pro beats every commercial third-party provider in Fish Audio’s 2026 blind tests, and MIT weights mean no vendor lock-in. Artificial Analysis tracks Fish Audio’s family at the top of the current TTS ELO leaderboard.

When to pick Fish Audio

You want the strongest open-weight TTS available in 2026. S2 Pro tops Fish Audio’s published blind-test rankings and Artificial Analysis’ ELO board.
Self-hosting saves real money. High-volume inference runs at zero marginal cost after GPU setup.
Multilingual coverage matters. 80+ languages on S2 Pro beats Voxtral’s 9 and matches ElevenLabs’ breadth.
You need MIT-licensed weights. Forkable, fine-tunable, no training-data restrictions on commercial use.
Expressive or character voices are the target. S2 Pro handles emotion and accent better than Cartesia or Voxtral in preference tests.

When to pick something else

Enterprise dubbing with lip-sync: Resemble AI ships a full Localize pipeline and deepfake detection layer Fish Audio does not match.
Sub-100ms streaming voice agents: Cartesia Sonic 3 hits 40-90ms time-to-first-audio; Fish Audio’s cloud latency lands higher.
Cheapest commercial API: Voxtral can still be cheaper for some text lengths and bundles STT, but compare exact units because Fish Audio now prices TTS by UTF-8 bytes.
Polished consumer creator UI: ElevenLabs still wins on voice library breadth and creator workflow polish.
Document-reading for personal use: Speechify solves the consumption case, not the production case.

Pricing

Plan	Price	Included	Notes
Self-hosted	$0	Unlimited	GPU with 8GB+ VRAM required
Free (cloud)	$0	7 min S2 / 8K credits	Testing only, non-commercial
Plus	$11/mo	~200 min S2 / 250K credits	Best fit for most creators, 1 seat
Pro	$75/mo	~27 hours S2 / 2M credits	Sustained production, 3 team seats
Max	$749/mo	~104 hours S2 / 25M credits	Agency-scale, 10 team seats
API	$15 / 1M UTF-8 bytes	Pay-as-you-go	s1 and s2-pro; ASR transcribe-1 is $0.36/audio hour

Prices verified 2026-06-25 via Fish Audio plan page and Fish Audio API pricing docs.

Against the alternatives

	Fish Audio S2 Pro	ElevenLabs v3	Voxtral	Cartesia Sonic 3
Blind-test quality	#1 in 2026	Strong, second on aggregate	Wins vs ElevenLabs Flash v2.5	Strong, tuned for real-time
Time-to-first-audio	Low, not sub-100ms	200-400ms streaming	~70ms multilingual	40-90ms
Open weights	MIT	No	CC BY-NC 4.0	No
Languages	80+	30+	9	25+
Commercial API	$15/1M UTF-8 bytes	Usage/credit-based	Usage/credit-based	Credit-based
Best viewed as	Open-source quality leader	Creator platform default	Cheap multilingual API	Real-time agent specialist

Failure modes

Self-hosting requires ops. GPU management, model updates, and inference tuning fall on the deployer. Teams without GPU operations experience should stick to the cloud or API.
Consumer UI trails ElevenLabs. The fish.audio dashboard covers the basics. It is not a full creator studio.
Smaller voice library than ElevenLabs. Stock voices are limited; voice cloning fills the gap but needs clean input audio.
Enterprise dubbing is not the product. No lip-sync, no automatic translation pipeline, no deepfake detection layer. Resemble ships that stack instead.
Community tooling still catching up. Third-party plugins and wrappers exist but are thinner than ElevenLabs’ ecosystem.
Credit math on the cloud plans. S1 and S2 consume credits at different rates. Heavy S2 users should price against the API or self-hosting before choosing Pro.

Recent changes

2026-06-23: Plan and API pricing refreshed. Plus $11, Pro $75, Max $749, and API pricing at $15 per 1M UTF-8 bytes for s1 and s2-pro remain the current pricing anchors.
2026-06-25: Rechecked plan page, API pricing docs, GitHub model claims, and Fish Audio’s June 23 S2.1 Pro API announcement. The page now flags S2.1 Pro API access as a new developer surface while keeping production API billing caveats.
2026-06-02: Pricing reconfirmed live. Plus $11, Pro $75, Max $749. The Pro tier covers ~1,620 generation minutes (27 hours) per month with three team seats; the Max tier unlocks ~6,250 minutes (104 hours) and ten team seats for sustained agency or platform workloads. API pricing docs list s1 and s2-pro at $15 per 1M UTF-8 bytes and transcribe-1 ASR at $0.36 per audio hour. Free tier remains 7 minutes plus 8K credits/mo, non-commercial.
2026-04-17: S2 Pro confirmed as flagship in Artificial Analysis’ aggregate TTS ELO leaderboard.

Methodology

This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies pricing and model details against primary sources, and generates the editorial analysis you are reading. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility, Value, Moat, Longevity). Last verified 2026-06-25 against Fish Audio plan page, Fish Audio API pricing docs, OpenAudio S2 page, Fish Audio 2026 provider comparison, and Artificial Analysis TTS leaderboard.

FAQ

Is Fish Audio free? Yes for self-hosting. OpenAudio S1 and S2 weights are MIT-licensed and run on consumer GPUs with 8GB+ VRAM. The cloud free tier gives 7 minutes of S2 generation for testing (Fish Audio blog).

What is the difference between S1 and S2? S1 is the fast default at low latency. S2 Pro is the expressive flagship trained on 10M+ hours across 80+ languages, and it ranked first in Fish Audio’s 2026 blind-provider comparison (S2 page).

How does Fish Audio compare to ElevenLabs? Fish Audio beats ElevenLabs on aggregate blind preference in 2026 Artificial Analysis benchmarks and the Fish Audio provider comparison. ElevenLabs still wins on creator-tool polish and voice library breadth.

Can I self-host for commercial use? Yes. MIT licensing permits commercial use of open weights with no royalty or training-data restrictions (GitHub).

Does Fish Audio support voice cloning? Yes. Both S1 and S2 support cloning from short reference samples. Quality improves with longer, cleaner reference audio.

Sources

Fish Audio homepage: product positioning, current models
Fish Audio plan page: Free, Plus, Pro, Max tiers; credits and minutes per tier
Fish Audio API pricing docs: API units for s1, s2-pro, and transcribe-1
OpenAudio S2 page: S2 Pro architecture and training data
Fish Audio 2026 blind-provider comparison: S2 Pro #1 ranking
Artificial Analysis: Fish Audio family: aggregate TTS ELO
Fish Speech GitHub: MIT weights, self-hosting instructions

Category: AI Voice / TTS
Comparisons: Fish Audio vs Voxtral, Fish Audio vs Resemble AI

Share LinkedIn

Was this review helpful?

Embed this score on your site Free. Links back.

HTML

<a href="https://aipedia.wiki/tools/fish-audio/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/fish-audio.svg" alt="Fish Audio / OpenAudio S1 + S2 on aipedia.wiki" width="260" height="72" /></a>

Markdown

[![Fish Audio / OpenAudio S1 + S2 on aipedia.wiki](https://aipedia.wiki/badges/fish-audio.svg)](https://aipedia.wiki/tools/fish-audio/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers

News writers

According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/fish-audio/)

APA

aipedia.wiki Editorial. (2026). Fish Audio / OpenAudio S1 + S2: Editorial Review. aipedia.wiki. Retrieved July 2, 2026, from https://aipedia.wiki/tools/fish-audio/

MLA 9

aipedia.wiki Editorial. "Fish Audio / OpenAudio S1 + S2: Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/fish-audio/. Accessed July 2, 2026.

Chicago

aipedia.wiki Editorial. 2026. "Fish Audio / OpenAudio S1 + S2: Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/fish-audio/.

BibTeX

@misc{fish-audio-openaudio-s1-s2-editorial-rev-2026,
  author = {{aipedia.wiki Editorial}},
  title = {Fish Audio / OpenAudio S1 + S2: Editorial Review},
  year = {2026},
  publisher = {aipedia.wiki},
  url = {https://aipedia.wiki/tools/fish-audio/},
  note = {Accessed: 2026-07-02}
}

Spotted an error or want to share your experience with Fish Audio / OpenAudio S1 + S2?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Fish Audio / OpenAudio S1 + S2 and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki

Report outdated info Help us keep this page accurate

$0-$75/month

Should you use it?

What to buy

Use it for this, skip it for that

Best for

Avoid if

Only what affects the decision

Best swaps

Fish Audio / OpenAudio S1 + S2 comparisons

Why this recommendation is trusted

Verified facts

System Verdict

Key Facts

What it actually is

When to pick Fish Audio

When to pick something else

Pricing

Against the alternatives

Failure modes

Recent changes

Methodology

FAQ

Sources

Related

Reader reviews