Skip to main content
Tool Voice open-source active Below 8
7.5/10 Useful
Active

Free (open-source)

Try Kokoro TTS

Editorial · no paid placements

The call

Kokoro is a free, Apache 2.0 text-to-speech model at 82M parameters. It runs locally on CPU or GPU with a ~300MB download, no API key, no per-character fees. V1.0 covers 54 voices across 8 languages. Best for offline or high-volume work; skip for voice cloning.

  • Buy if Offline and local text-to-speech workflows
  • Pick Free (open-source)
  • Skip if Voice cloning, emotional direction, or real-time voice agents

Editorial score

Unweighted average of 4 axes · confidence high

  • Utility 8/10

    How much real work it can do for a competent operator, end to end.

  • Value 10/10

    What you get for the dollar relative to the closest alternative.

  • Moat 5/10

    How hard it would be for a competitor to replicate the underlying advantage.

  • Longevity 7/10

    How likely the product is to still be best-in-class 24 months out.

Key facts

  1. Best For Best for developers experimenting with lightweight open TTS models and local/offline voice synthesis workflows.
    high Volatile 2026-05-13 Kokoro model card
  2. Pricing Anchor Kokoro is distributed as an open model; costs come from inference hardware/hosting and any downstream service wrapper rather than a vendor SaaS plan.
    high Drifts 2026-05-13 Kokoro GitHub repository
  3. Watch Out For Before commercial use, review license, voice rights, quality across languages, hallucinated pronunciations, model provenance, and abuse/safety controls.
    high Volatile 2026-05-13 Kokoro model card
  4. Open Source Or Local The Hugging Face model card and repository are the authoritative sources for model files, license notes, examples, and project activity.
    high Volatile 2026-05-13 Kokoro README
  5. Workflow Surface The Hugging Face Space is useful for quick evaluation, but production usage should verify local inference, license, voices, and latency separately.
    high Volatile 2026-05-13 Kokoro TTS Hugging Face Space

An open-weight text-to-speech Arena leaderboard in January 2026 above much larger models like XTTS v2 (467M) and MetaVoice (1.2B).

Apache 2.0 licensed. No API key. No usage caps. No network calls after the initial model download.

System Verdict

Pick Kokoro if the use case is offline, high-volume, or privacy-constrained English TTS with a fixed voice. The download is ~300MB, runs on a laptop, and costs nothing past electricity. Community ONNX builds ship in 88MB-310MB size variants for mobile and browser deployment.

Skip it if the job needs voice cloning, fine-grained emotion control, or real-time streaming. ElevenLabs keeps the quality ceiling and the voice-library breadth. Cartesia owns low-latency conversational use cases. MiniMax Speech undercuts ElevenLabs on price for multilingual workloads that still want a hosted API.

Kokoro’s moat is size-efficiency, not features. The 82M parameter count means laptop-local inference at commercial-grade quality for a narrow slice of jobs.

Key Facts

Model size82M parameters (~300MB download)
LicenseApache 2.0 (commercial use permitted)
ArchitectureModified StyleTTS 2
Voices (v1.0)54 voices across 8 languages
Languages (v1.0)English (US + UK), Spanish, French, Hindi, Italian, Japanese, Mandarin Chinese
InferenceCPU and CUDA GPU; Apple Silicon via ONNX
Deployment formatsPyTorch, ONNX (fp32 310MB, fp16 169MB, int8 88MB)
Hosted API costUnder $1 per 1M input characters via third-party providers
ReleasedNovember 2024; v1.0 early 2026

What it actually is

A small neural TTS model that turns text into audio locally. The architecture is a modified StyleTTS 2 trained on permissive, non-copyrighted audio with IPA phoneme labels.

The Python package (pip install kokoro) wraps inference with a minimal API. ONNX builds target mobile, browser, and non-Python runtimes. A Gradio demo ships for no-code local testing.

The moat is size. At 82M parameters Kokoro takes under 300MB on disk and runs in real-time on CPU. Competing open models at comparable quality (XTTS v2, Tortoise) are 4-5x larger and need a GPU for acceptable latency.

When to pick Kokoro

  • Self-hosted AI stacks that must stay offline. Pair with a local LLM for end-to-end air-gapped audio pipelines.
  • High-volume narration where per-character fees hurt. Audiobooks, podcasts, subtitles, game VO at scale.
  • Privacy-sensitive text (medical, legal, financial). No outbound API call means no data egress.
  • Edge and mobile deployments. The int8 ONNX build is 88MB. Fits on a phone.
  • Research and reproducibility. Fixed weights and deterministic inference avoid the drift introduced by hosted-model upgrades.

When to pick something else

  • Voice cloning from a reference clip: ElevenLabs, Fish Audio, or MiniMax Speech. Kokoro ships fixed voices only.
  • Fine-grained emotional control: ElevenLabs v3 or MiniMax Speech-02. Kokoro’s prosody controls stay basic.
  • Real-time streaming for conversational agents: Cartesia is built for this. Kokoro generates full audio before playback.
  • Languages beyond the v1.0 set of 9: ElevenLabs and MiniMax cover 30+ languages with native prosody.
  • Studio production UI with takes and timeline editing: Murf or ElevenLabs Studio. Kokoro is code-first.

Pricing

PathCost
Self-hosted modelFree (Apache 2.0)
Own hardwareElectricity only
Hosted API (third-party)Under $1 per 1M input characters (Together AI, others)
Commercial usePermitted under Apache 2.0 without royalty

Reverified 2026-05-13 via the Kokoro-82M Hugging Face repo and ONNX community builds. Self-hosted inference is free; hosted APIs price per million characters.

Against the alternatives

Kokoro (82M)XTTS v2 (~467M)ElevenLabs (hosted)
LicenseApache 2.0CPML (non-commercial by default)Proprietary
Parameter count82M467MNot disclosed
Voice cloningNoYes (instant)Yes (best-in-class)
Languages8 (v1.0)1732+
Real-time streamingNoLimitedYes
Emotion controlBasicBasicFine-grained
Cost at 10M charsElectricityElectricity~$300+ on paid tier
Best viewed asSmall, offline-firstMid-size clone-capableHosted quality ceiling

Failure modes

  • No voice cloning. Fixed pre-trained voices only. Custom-voice work requires a different model.
  • Prosody is basic. No fine-grained emotion sliders. Tone is controlled mainly by text wording and punctuation.
  • No streaming. Full audio generates before playback. Latency is not viable for real-time agent loops.
  • English quality leads; other languages lag. The 8-language v1.0 list is functional but native-speaker critique can show gaps against specialist models.
  • No hosted first-party API. Third-party providers (Together, Replicate) exist, but there is no vendor SLA.
  • CPU runs are real-time, GPU is 10-20x faster. Long-document batches on CPU get slow.
  • Community-driven release cadence. Version bumps depend on hexgrad’s time. Update frequency is irregular.

Methodology

This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies claims against primary sources, and generates the editorial analysis shown here. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility, Value, Moat, Longevity, unweighted average). Last verified 2026-05-13 against the Kokoro-82M Hugging Face repo, VOICES.md, hexgrad GitHub, and onnx-community Kokoro-82M-v1.0-ONNX builds.

FAQ

Is Kokoro free for commercial use? Yes. The model is Apache 2.0 licensed, which allows commercial use without royalties (Hugging Face).

How does Kokoro compare to ElevenLabs? Kokoro matches ElevenLabs on fixed-voice English narration quality in blind TTS Arena tests. ElevenLabs still wins on voice cloning, emotion sliders, real-time streaming, and language breadth. Kokoro wins on cost (free vs per-character) and privacy (local vs hosted).

How do I run Kokoro? pip install kokoro soundfile. Basic inference:

from kokoro import KPipeline
pipeline = KPipeline(lang_code='a')
audio, _ = pipeline("Your text here.", voice='af_heart')

ONNX builds exist for deployment outside Python (onnx-community).

How many voices and languages does Kokoro support? V1.0 ships 54 voices across 8 languages: English (US, UK), Spanish, French, Hindi, Italian, Japanese, and Mandarin Chinese. Voices come in US female, US male, UK female, UK male, and regional variants. See the VOICES.md reference for the full list.

Can Kokoro clone my voice? No. Kokoro supports fixed voices only. For zero-shot voice cloning from a short reference clip, use ElevenLabs, Fish Audio, or MiniMax Speech.

Sources

Reader reviews

Loading…
Share LinkedIn
Was this review helpful?
Embed this score on your site Free. Links back.
Kokoro TTS editorial score badge
<a href="https://aipedia.wiki/tools/kokoro/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/kokoro.svg" alt="Kokoro TTS on aipedia.wiki" width="260" height="72" /></a>
[![Kokoro TTS on aipedia.wiki](https://aipedia.wiki/badges/kokoro.svg)](https://aipedia.wiki/tools/kokoro/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers
According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/kokoro/)
aipedia.wiki Editorial. (2026). Kokoro TTS — Editorial Review. aipedia.wiki. Retrieved May 29, 2026, from https://aipedia.wiki/tools/kokoro/
aipedia.wiki Editorial. "Kokoro TTS — Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/kokoro/. Accessed May 29, 2026.
aipedia.wiki Editorial. 2026. "Kokoro TTS — Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/kokoro/.
@misc{kokoro-tts-editorial-review-2026, author = {{aipedia.wiki Editorial}}, title = {Kokoro TTS — Editorial Review}, year = {2026}, publisher = {aipedia.wiki}, url = {https://aipedia.wiki/tools/kokoro/}, note = {Accessed: 2026-05-29} }
Spotted an error or want to share your experience with Kokoro TTS?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Kokoro TTS and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki
Report outdated info Help us keep this page accurate