Skip to main content
Tool Voice freemium active 9+
Verified Apr 2026 Voice #2 in Voice Editorial only, no paid placements

Whisper

Active

OpenAI's open-weights speech-to-text model. 99 languages, MIT license for self-host, and a $0.006/minute API. Underpins most third-party transcription products.

Best plan Free self-host / API $0.006 per minute Free + paid plans
Best for Self-hosted multilingual transcription Voice
Watch Real-time streaming transcription (batch-first architecture) Check fit before switching
Pricing Free self-host / API $0.006 per minute
Launched 2022
Watchlist Whisper

Save this page locally, then revisit it when pricing, score notes, or related news changes.

Decision badges Readiness signals
Active productFree tierPublic repo listedVerified this monthMonthly review cycleStrong editorial score
Fact ledger Verified fields
Company
openai
Category
Voice
Pricing model
Free tier
Price range
Free self-host / API $0.006 per minute
Status
Active
Last verified
Apr 23, 2026
Pricing Anchor API; $0.006/min; Whisper-1 API, verified unchanged Source
Best For OpenAI's open-weights speech-to-text model. 99 languages, MIT license for self-host, and a $0.006/minute API. Underpins most third-party transcription products. Best for speech, voice, transcription, or audio-agent workflows. AssemblyAI
Watch Out For Non-Tier-1 canonical profile: verify current pricing, usage limits, data policy, and integration details before procurement Source
Change timeline What moved recently
  1. Verified
    Core pricing and product facts checked Apr 23, 2026 | Monthly cadence
  2. Updated
    Editorial page changed Apr 23, 2026
  3. Price
    GPT-4o Transcribe - $0.006/min Mar 20, 2025 | OpenAI shipped GPT-4o Transcribe + Mini Transcribe as modern successors; Whisper-1 remains available and MIT weights unchanged
  4. Price
    API - $0.006/min Apr 17, 2026 | Whisper-1 API, verified unchanged
Knowledge graph Adjacent context
Company openai
Category Voice
Best for
  • Self-hosted multilingual transcription
  • Batch processing long recordings locally
  • Building transcription into products via API
  • Accents and noisy audio beating most commercial APIs
  • Subtitle generation across 99 languages
Not ideal for
  • Real-time streaming transcription (batch-first architecture)
  • Speaker diarization out of the box (use GPT-4o Transcribe or third-party wrappers)
  • Live consumer apps that need word-level latency under 1 second

OpenAI’s open-weights speech-to-text model. Shipped September 2022 as a 1.5B-parameter transformer trained on 680,000 hours of multilingual audio. Weights remain under the MIT license. OpenAI’s hosted whisper-1 API runs at $0.006 per minute.

As of March 2025, OpenAI added two newer hosted transcription models on the same endpoint: GPT-4o Transcribe (same $0.006/min, improved accuracy on low-resource languages, optional speaker diarization) and GPT-4o Mini Transcribe ($0.003/min, budget tier). Whisper-1 is now the legacy model but stays available and stays the reference for self-hosted deployments.

Recent developments

  • April 2026: Self-hosted Whisper remains the dominant open-weights baseline. Projects like faster-whisper (CTranslate2 runtime) and whisper.cpp (C++ port) deliver 5-10x speedups on consumer hardware; Apple Silicon inference is routinely faster-than-realtime.
  • March 2025: OpenAI shipped GPT-4o Transcribe and Mini Transcribe as hosted successors on the same /v1/audio/transcriptions endpoint. Whisper-1 stays callable by model ID; MIT weights are unchanged.

System Verdict

Pick Whisper if you need multilingual transcription with the option to self-host, or a $0.006/minute API baseline. Self-hosted under the MIT license gives unlimited offline volume; the hosted API covers teams that do not want to operate GPUs. Accent and noise robustness beat most commercial APIs on published evaluations.

Skip it for real-time streaming, word-accurate timestamps, or built-in speaker diarization. Whisper is batch-first: segment-level timestamps only, and no native diarization. GPT-4o Transcribe with Diarization fits there. For sub-second live transcription, AssemblyAI and Deepgram lead.

Who pays which tier: Self-host free for researchers, privacy-first teams, and batch jobs. OpenAI API at $0.006/min for teams that do not want to run GPUs. GPT-4o Mini Transcribe at $0.003/min for cost-sensitive production. Third-party wrappers (Replicate, Hugging Face Inference Endpoints) offer managed Whisper hosting at comparable rates.

Key Facts

ModelWhisper-1 (1.5B parameters, transformer encoder-decoder)
LicenseMIT (weights and reference code)
Successor models on same APIGPT-4o Transcribe, GPT-4o Mini Transcribe
Languages99
API pricing$0.006/min (Whisper-1, GPT-4o Transcribe) · $0.003/min (Mini)
Self-host costFree (MIT) plus GPU compute
Max file size (API)25MB per request; split long files client-side
Formats acceptedmp3, mp4, mpeg, mpga, m4a, wav, webm, flac, ogg
TimestampsSegment-level (word-level via Whisper-Timestamped fork)
Speaker diarizationNo native; available via GPT-4o Transcribe or third-party (pyannote, whisperx)
Real-time streamingNo native; batch-first architecture

Every data point above verified on 2026-04-17 via openai.com and OpenAI API pricing.

What it actually is

A single transformer model that takes audio in and emits text plus segment-level timestamps. Shipped with five size variants (tiny through large-v3). Large-v3 is the production default; smaller variants run on CPU or mobile hardware with tradeoffs on accuracy.

The moats compound:

  • Open weights. MIT license. Downloadable from OpenAI’s GitHub or Hugging Face. Runnable anywhere.
  • Multilingual depth. 99 languages trained together, not a patchwork of per-language models.
  • Community runtimes. faster-whisper, whisper.cpp, whisperx, mlx-whisper, and others deliver the speed gains the reference implementation lacks.
  • Baseline status. Every transcription tool in the last three years benchmarks against Whisper. Most wrap it.

When to pick Whisper

  • Self-hosting is a hard requirement. Air-gapped environments, healthcare audio, legal transcription where PHI cannot leave the perimeter.
  • Batch processing beats real-time. Podcast archives, meeting recordings, video subtitling pipelines.
  • Multilingual audio. 99 languages in one model. No per-language routing logic.
  • Building a transcription product. MIT license lets you ship commercial products without royalty.
  • Accents and noise matter. Field recordings, mobile audio, conference rooms with poor acoustics.

When to pick something else

  • Real-time streaming transcription: AssemblyAI or Deepgram. Whisper’s batch-first design fights against sub-second latency.
  • Speaker diarization bundled: GPT-4o Transcribe with Diarization on the same OpenAI endpoint, or whisperx as a self-host wrapper.
  • Word-level timestamps: Whisper-Timestamped or whisperx forks. Base Whisper emits segment-level only.
  • Podcast recording + editing: Descript wraps transcription with a transcript-first editor.
  • Translation alongside transcription: Whisper’s translate task covers English targets only. For other targets, pair transcription with a separate translation model.

Pricing

OpenAI API (via /v1/audio/transcriptions):

ModelPriceNotes
whisper-1$0.006/minLegacy Whisper-1; MIT self-host equivalent is free
gpt-4o-transcribe$0.006/minModern successor, improved low-resource language accuracy
gpt-4o-transcribe with diarization$0.006/minSpeaker labels in output
gpt-4o-mini-transcribe$0.003/minCost-sensitive tier

Self-host weights at github.com/openai/whisper and on Hugging Face. Managed Whisper hosting via Replicate or Hugging Face Inference Endpoints runs $0.003-$0.01/minute depending on model size and batching.

Prices verified 2026-04-17 via OpenAI API pricing.

Against the alternatives

Whisper (self-host)OpenAI API whisper-1AssemblyAIDeepgram
PriceFree + GPU$0.006/min$0.37/hour (~$0.0062/min)$0.0043/min
Open weightsMITNoNoNo
Languages99999950+
Real-timeNoNoYesYes (lowest latency)
DiarizationVia wrappersGPT-4o Transcribe tierNativeNative
Best viewed asOpen-source baselineManaged WhisperReal-time specialistLow-latency streaming

Failure modes

  • Hallucinated text on silence. Whisper is prone to generating plausible-sounding text when the audio contains long silence or non-speech. Mitigations: VAD preprocessing (Silero VAD), chunk with overlap, or move to GPT-4o Transcribe which handles this better.
  • No word-level timestamps in base model. Use whisperx or Whisper-Timestamped forks for word-accurate alignment.
  • No speaker diarization natively. Add pyannote, whisperx, or switch to GPT-4o Transcribe.
  • 25MB API file cap. Long recordings need client-side chunking.
  • Large-v3 is compute-heavy. Real-time on CPU needs small or tiny variants with quality tradeoffs; use faster-whisper + quantization for 5-10x speedup.
  • English bias. Accuracy is strongest on English, weakest on low-resource languages. Vendor-reported multilingual numbers mix across tiers.

Methodology

This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies pricing and model details against primary sources, and generates the editorial analysis you are reading. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/. Last verified 2026-04-23 against github.com/openai/whisper, OpenAI Whisper research post, and OpenAI API pricing.

FAQ

Is Whisper free?

Yes, when self-hosted. The model weights and reference code are under the MIT license. Downloadable from OpenAI’s GitHub or Hugging Face. You pay only for compute. The OpenAI-hosted API costs $0.006 per minute of audio.

How accurate is Whisper?

On English, Whisper-1 Large-v3 sits near the top of published leaderboards, within 1-2 WER points of the best commercial APIs. On multilingual audio, accents, and noisy recordings, independent evaluations often rank it first. GPT-4o Transcribe improves on low-resource languages specifically.

What's the difference between Whisper-1 and GPT-4o Transcribe?

Whisper-1 is the 2022 open-weights model, MIT license, widely self-hosted. GPT-4o Transcribe (March 2025) is OpenAI’s hosted successor on the same API endpoint with improved accuracy and optional speaker diarization. Same $0.006/min pricing. Whisper-1 stays callable by model ID; GPT-4o Transcribe is the default recommendation for new builds on the hosted API.

Can Whisper do real-time streaming transcription?

Not natively. The base model is batch-first: it expects complete audio chunks. For real-time, use AssemblyAI or Deepgram, which are architected for low-latency streaming.

Does Whisper do speaker diarization?

Not natively. Options: (1) wrap it with pyannote.audio or whisperx for open-source diarization, (2) switch to GPT-4o Transcribe with Diarization on the OpenAI API, or (3) use a commercial tool that wraps Whisper (Descript, AssemblyAI) for built-in speaker labels.

What languages does Whisper support?

99 languages in a single multilingual model. English and major European languages hit the highest accuracy tiers. Low-resource languages (Swahili, Bengali, Tamil, Khmer) trail commercial specialists but remain usable. GPT-4o Transcribe closes some of the low-resource gap.

Share LinkedIn
Was this review helpful?
Embed this score on your site Free. Links back.
Whisper editorial score badge
<a href="https://aipedia.wiki/tools/whisper/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/whisper.svg" alt="Whisper on aipedia.wiki" width="260" height="72" /></a>
[![Whisper on aipedia.wiki](https://aipedia.wiki/badges/whisper.svg)](https://aipedia.wiki/tools/whisper/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers
According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/whisper/)
aipedia.wiki Editorial. (2026). Whisper — Editorial Review. aipedia.wiki. Retrieved May 8, 2026, from https://aipedia.wiki/tools/whisper/
aipedia.wiki Editorial. "Whisper — Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/whisper/. Accessed May 8, 2026.
aipedia.wiki Editorial. 2026. "Whisper — Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/whisper/.
@misc{whisper-editorial-review-2026, author = {{aipedia.wiki Editorial}}, title = {Whisper — Editorial Review}, year = {2026}, publisher = {aipedia.wiki}, url = {https://aipedia.wiki/tools/whisper/}, note = {Accessed: 2026-05-08} }
Spotted an error or want to share your experience with Whisper?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Whisper and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki
Report outdated info Help us keep this page accurate