Llama

Llama 4 Maverick remains the strongest open-weight LLM to evaluate for commercial self-hosting, VPC, and model-diversification...

8.5/10 Strong

Active

Monthly $0 weights Annual hosted Scout from $0.11/M input on Groq, Maverick $0.27/$0.85 on Together

Best plan

$0 weights

Risk: Open weights do not eliminate operational burden...

Try Llama

Editorial · no paid placements

Should you use it?

Llama 4 Maverick remains the strongest open-weight LLM to evaluate for commercial self-hosting, VPC, and model-diversification work. Scout is the long-context lane and the current Groq fast-inference option at $0.11/M input and $0.34/M output, while Together still lists Maverick at $0.27/M input and $0.85/M output. Groq's broader announcement also lists Maverick pricing, but provider model cards and pricing pages should be treated as the source of truth before buying. Skip Llama for peak closed-model reasoning or bundled image/video generation.

Buy if Self-hosted or VPC deployment
Pick $0 weights; hosted Scout from $0.11/M input on Groq, Maverick $0.27/$0.85 on Together
Skip if Peak reasoning versus current closed frontier models

Plan guidance

What to buy

Best plan $0 weights; hosted Scout from $0.11/M input on Groq, Maverick $0.27/$0.85 on Together

Watch: Open weights do not eliminate operational burden...

Price range $0 weights; hosted Scout from $0.11/M input on Groq, Maverick $0.27/$0.85 on Together

Scout $0.11 / $0.34 per 1M tokens on Groq; Maverick $0.27 / $0.85 on Together, with provider-specific Maverick...

Upgrade only if Not for peak reasoning versus current closed frontier models

Open weights do not eliminate operational burden...

Current pricing source: Groq Llama 4 Scout model card

Fit

Use it for this, skip it for that

Best for

Self-hosted or VPC deployment
Cost-sensitive API workloads
Long-context retrieval (Scout 10M tokens)
Fine-tuning and LoRA adapters

Avoid if

Peak reasoning versus current closed frontier models
Consumer chat with image generation
EU-based entities restricted by license
Organizations over 700M MAU

Watch out: Open weights do not eliminate operational burden; benchmark quality, safety filters, data rights, hosting cost, and license restrictions before standardizing.

Recent changes

Only what affects the decision

Jun 25, 2026
Hosted Scout and Maverick
Rechecked Meta's Llama 4 announcement, Groq Scout announcement/model-card posture, and Together...
Groq Llama 4 Scout model card
Jun 23, 2026
Hosted Scout and Maverick
Rechecked Meta's Llama 4 announcement, Groq Scout and Maverick model docs, and Together...
Groq Llama 4 Scout model card
Jun 8, 2026
Hosted Scout (Groq) / Maverick (Together)
June refresh: Groq's live public model card is Scout, while Together continues to list Maverick. Treat provider-specific Maverick availability and pricing as volatile
Groq Llama 4 Scout model card

Alternatives

Best swaps

ChatGPT

OpenAI's flagship AI assistant, with GPT-5 models, image generation, Codex coding agent, voice, and agent mode across web, mobil

$0-$200/month · 9.5/10 Claude

Anthropic's AI assistant. Strongest on long-context reasoning, agentic coding, and long-form writing.

$0-$200/month · 9.3/10 Ollama

Local open-model runtime plus optional Ollama Cloud inference. Free local runtime; Cloud Pro $20/mo or $200/yr; Max $100/mo; Tea

$0 local / $20-$100/mo cloud · 9/10

Build comparison

Proof and score math Verified Jun 25

Proof

Why this recommendation is trusted

Evidence Meta Llama official site

Source: Registered source
Freshness: Current
Confidence: High confidence
Verified: Jun 25, 2026
Review: Sep 8, 2026
Volatility: Volatile

High-volatility evidence needs frequent review.

Editorial score

Unweighted average of 4 axes · confidence high

Utility 8/10

How much real work it can do for a competent operator, end to end.
Value 10/10

What you get for the dollar relative to the closest alternative.
Moat 7/10

How hard it would be for a competitor to replicate the underlying advantage.
Longevity 9/10

How likely the product is to still be best-in-class 24 months out.

Verified facts

Best For Best for teams that want Meta open-weight language models for self-hosting, fine-tuning, privacy-sensitive deployments, and model-provider diversification.
high Drifts 2026-06-25 Meta Llama official site
Pricing Anchor Llama model weights are downloadable under Meta's license, but real cost comes from inference hosting, GPUs, fine-tuning, vendor APIs, and compliance work.
high Drifts 2026-06-25 Llama downloads
Watch Out For Open weights do not eliminate operational burden; benchmark quality, safety filters, data rights, hosting cost, and license restrictions before standardizing.
high Drifts 2026-06-25 Llama model documentation
Model Control Model cards and prompt-format docs are the source of truth for variants, context behavior, tool-use formats, and deployment assumptions.
high Drifts 2026-06-25 Llama model documentation
Open Source Meta's GitHub utilities are useful for model-adjacent tooling, examples, and release artifacts, but license terms still need separate review.
high Drifts 2026-06-25 Meta Llama GitHub repository

Open Source Slowing

meta-llama/llama-models

4 months agolast commit

Full review notes Long-form details, FAQ, and source history

Meta’s open-weight LLM family. Llama 4 Maverick (400B total, 17B active parameters, mixture-of-experts, 1M context) is the current flagship. Scout (109B total, 17B active, 10M context) fits on a single H100 and owns the long-context tier. Behemoth (2T total, 288B active) remains an internal teacher model; Meta has not publicly released it.

May 5, 2026 competitive note: Google released MTP drafters to make Gemma 4 inference up to 3x faster. For Llama buyers, the watch item is not only model quality but latency: official speculative-decoding assets can make Gemma more practical on local and workstation hardware.

April 2, 2026 competitive note: Google released Gemma 4 under Apache 2.0. Apache licensing is strictly more permissive than Meta’s Llama 4 Community License (which caps at 700M monthly active users). For self-hosters with concerns about the Llama license, Gemma 4 is the closest drop-in alternative at comparable small-to-mid scale.

Weights ship free under the Llama 4 Community License. Hosted inference varies by provider. Groq’s live Llama 4 Scout model card lists $0.11 per million input tokens. Groq also maintains Maverick docs, but provider-specific availability, context, and pricing should be checked live before a purchase order.

System Verdict

Pick Llama if you need an open-weight frontier LLM you can self-host, fine-tune, or run inside a VPC. Meta’s vendor-reported benchmarks put Maverick ahead of older closed-model baselines, and Scout’s 10M-token context outruns most closed assistants for long-document retrieval. Cheapest hosted pricing in the frontier tier.

Skip it if you need peak closed-model reasoning or bundled multimodal output. Claude leads for careful long-form analysis and code reasoning, ChatGPT remains the broadest finished assistant with image features, and Gemini is the Google-native media/workspace lane. Llama provides model control, not a complete consumer workspace.

Who pays which tier: for speed-sensitive APIs, Together or Fireworks for Maverick and fine-tuning workflows, and AWS Bedrock or Azure for compliance-heavy enterprise deployments. EU-based entities should read the license carefully before committing.

Key Facts


Flagship model	Llama 4 Maverick (400B total, 17B active, 128 experts, 1M context)
Long-context model	Llama 4 Scout (109B total, 17B active, 16 experts, 10M context)
Internal teacher	Llama 4 Behemoth (~2T total, 288B active) · not publicly released
Released	April 5, 2025 (Scout + Maverick)
License	Llama 4 Community License · free commercial use under 700M MAU
Multimodal	Native text + image input (vision) on Scout and Maverick
Hosted providers	Groq · Together · Fireworks · DeepInfra · Replicate · Hugging Face · AWS Bedrock · Azure · Google Vertex · Databricks · SambaNova · Snowflake
Cheapest hosted path verified this pass	Groq Scout at $0.11 / $0.34 per 1M tokens; Together Maverick at $0.27 / $0.85
Consumer UI	Meta AI at meta.ai (free, ad-adjacent)
Fine-tuning	Full weights, LoRA, and QLoRA supported across providers

Every data point above was verified against vendor sources on 2026-06-25. See Sources.

What it actually is

One open-weight model family published by Meta and distributed free under a custom community license. Developers download weights from llama.com or Hugging Face and run inference anywhere: on-prem GPUs, cloud VMs, VPC-isolated endpoints, or managed APIs.

per token out of a 400B total pool, giving frontier-class quality at a fraction of dense-model compute cost. Scout activates the same 17B but spreads across 109B total and a 10M token context, the longest shipping context window in any released model.

The moats: weights are actually free, the Community License permits commercial use for almost every company, and the hosted ecosystem (Groq’s LPU hardware for Scout, Together’s Maverick/fine-tune infra, AWS Bedrock’s enterprise SLAs) keeps the family cheaper and more controllable than most closed frontier deployments. Behemoth’s role as a 2T-parameter teacher improves the smaller models through codistillation without ever shipping to the public.

The weaknesses: no native image generation, no video, no consumer app with the reach of ChatGPT or Gemini. The license carves out EU-based entities and companies over 700M monthly active users. Reasoning and agentic coding can still trail the best closed frontier models in buyer tests.

When to pick Llama

You need full data sovereignty or VPC deployment. Run weights inside your own network. No vendor sees your tokens. Closed frontier models cannot match this.
You fine-tune on proprietary data. Full weights plus LoRA and QLoRA adapters across Together, Fireworks, and AWS Bedrock. Closed models offer narrower fine-tune access at higher prices.
Your workload is API cost-sensitive. and Together Maverick at $0.27 / $0.85 can undercut closed frontier APIs for many high-volume tasks, but provider availability and quantization need live checks.
You need 10M+ token context. Scout is the Llama-family long-context lane, but hosted providers may expose shorter public context windows than the raw model specification. Check provider limits before promising a 10M-token workflow.
You build multilingual or global products. Llama 4 trains on 200+ languages and ships with stronger non-English performance than most closed models at equivalent size.

When to pick something else

Careful reasoning or long-form writing: Claude. Leads when careful drafting, long analysis, critique, and coding judgment matter more than open-weight control.
Image generation bundled with chat: ChatGPT with GPT Image 2 or Gemini with Imagen 4. Llama has no image output.
Video generation: Gemini with Veo 3. Llama has none.
Fully permissive Apache-style license: Mistral AI (Mistral-Small and Pixtral) or DeepSeek V3.2. Llama’s Community License restricts EU entities and 700M+ MAU orgs.
Chinese-market or local-deployment open weights: Qwen or GLM. Better Mandarin performance and fewer geopolitical frictions.

Pricing

Llama weights are free. Costs come from hosted inference or your own compute. Representative hosted pricing via Together AI and Groq’s live Scout model card, verified 2026-06-25.

Access path	Input ($/1M tok)	Output ($/1M tok)	Context	Who’s it for
Self-hosted (own GPUs)	$0	$0	Full	Teams with H100/MI300 clusters
Meta AI (meta.ai)	Free	Free	Capped	Consumer chat, casual use
Groq (Scout)	$0.11	$0.34	128K on Groq’s public model card	Speed-first multimodal API workloads
DeepInfra FP8 (Maverick)	$0.15	$0.50	1M	Cheapest hosted input
Together AI (Maverick)	$0.27	$0.85	1M	Fine-tune + inference combo
Fireworks (Maverick)	$0.40	$1.20	1M	Production SLAs, fine-tune
Together AI (Scout)	$0.08	$0.30	10M	Long-context retrieval
AWS Bedrock / Azure	Custom	Custom	1M	Enterprise compliance, BAAs

Prices verified 2026-06-25 via Together AI pricing, Groq Llama 4 Scout model card, Groq Llama 4 Maverick model card, and Groq model deprecation docs. Provider-specific Maverick rates can move quickly, so check the live provider page before quoting a purchase order.

Against the alternatives

	Llama 4 Maverick	DeepSeek V3.2	Mistral Large 2
License	Llama Community (700M MAU cap)	MIT (fully permissive)	Mistral Research (non-commercial)
Context window	1M tokens	128K	128K
Cheapest hosted in / out	$0.15 / $0.50	$0.14 / $0.28	$2.00 / $6.00
Multimodal input	Text + image	Text only	Text + image (Pixtral sibling)
Self-host weights	Yes	Yes	Yes (research only)
Vendor-reported coding	Strong	Strongest open-weight	Mid
Best viewed as	Open-weight default	Cheapest frontier API	Enterprise EU alternative

Failure modes

License is not Apache. The Llama 4 Community License excludes EU-based entities from the license grant and requires a separate license for companies over 700M monthly active users. Read the terms before shipping to those markets.
No native image or video output. Llama is text-plus-vision-input only. Workflows needing image or video generation need a second tool.
Behemoth is not public. Meta’s 2T-parameter model remains an internal teacher. Benchmarks citing Behemoth performance do not reflect anything you can actually use.
Quality lag vs closed frontier. Vendor benchmarks make Maverick look competitive with older closed baselines, but third-party leaderboards and buyer tests can still favor current Claude, OpenAI, and Gemini models on the hardest reasoning and agent tasks.
Hosted provider variance. Same model, different providers, different quality: FP8 quantized endpoints (DeepInfra, Azure) run cheaper but sacrifice some output quality vs full-precision. Benchmark your specific workload before committing.
No first-party consumer UI competitive with ChatGPT. Meta AI at meta.ai is ad-adjacent, feature-thin, and not positioned as a daily-driver assistant.
Self-hosting is expensive. A single H100 runs Scout; Maverick needs multi-GPU setups. If you lack cluster access, hosted APIs are cheaper than building infrastructure.
Fine-tune licensing inherits upstream. Derivatives of Llama must carry the Community License terms. You cannot relicense a fine-tuned Llama under Apache or MIT.

Methodology

This page was produced by the aipedia.wiki editorial pipeline, an automated system that ingests vendor documentation, verifies pricing and model details against primary sources, and generates the editorial analysis you are reading. No individual human wrote this review. Scoring follows the four-dimension rubric at /about/scoring/ (Utility x Value x Moat x Longevity, unweighted average). Last verified 2026-06-25 against the Llama 4 announcement, Llama 4 Community License, Together AI pricing, Groq Llama 4 Scout model card, Groq Llama 4 Maverick model card, and Groq model deprecation docs.

FAQ

Is Llama free? Yes. Weights are free under the Llama 4 Community License. Self-hosting costs your compute, deployment work, and compliance review. Hosted APIs bill per token; the June 8 check found Groq Scout at $0.11/M input and Together Maverick at $0.27/M input. Meta AI at meta.ai is free for consumer chat.

Can I use Llama commercially? Yes for almost all companies. The Community License grants commercial use to any organization under 700M monthly active users. Companies above that threshold (Google, Microsoft, Apple scale) need a separate Meta license. EU-based entities are explicitly carved out of some license provisions. Read the Llama 4 Community License.

What is the current Llama flagship? Llama 4 Maverick: 400B total parameters, 17B active, 128 experts, 1M token context. It remains the strongest production-ready Llama model as of the June 12, 2026 check. Scout (109B / 10M raw-model context, with shorter hosted context on some providers) wins for long-document jobs. Behemoth (2T) is still an internal teacher model and has not shipped.

How does Llama compare to Claude? Current Claude models remain the safer pick for careful long-form reasoning, writing, and agentic coding judgment. Llama wins on model control, self-hosting, and provider competition. Use Claude for peak assistant quality; use Llama when data control, open weights, cost, or model diversification is the purchase reason.

Which hosted provider should I pick? Groq for Scout speed and low token rates. Together for Maverick and fine-tune workflows. Fireworks for production SLAs. AWS Bedrock or Azure for enterprise compliance with BAAs and SOC 2. Recheck the live provider model page because model availability can change faster than Meta’s base model release cycle.

Sources

Llama 4 announcement (ai.meta.com): Official Scout, Maverick, and Behemoth specifications
Llama 4 Community License: License terms, 700M MAU threshold, EU carve-out
Llama 4 Maverick on Hugging Face: Canonical weight distribution
Together AI pricing: Hosted Maverick rates and fine-tuning rows
Groq Llama 4 Scout model card: Scout pricing, context, speed, and capabilities
Groq Llama 4 Maverick model card: Maverick architecture, public model-card status, and provider-specific context
Groq model deprecation docs: provider-specific model churn context

Category: AI Chatbots · AI Coding
Alternatives: Claude · ChatGPT · Gemini · DeepSeek · Mistral AI · Qwen · GLM

Share LinkedIn

Was this review helpful?

Embed this score on your site Free. Links back.

HTML

<a href="https://aipedia.wiki/tools/llama/" target="_blank" rel="noopener"><img src="https://aipedia.wiki/badges/llama.svg" alt="Llama on aipedia.wiki" width="260" height="72" /></a>

Markdown

[![Llama on aipedia.wiki](https://aipedia.wiki/badges/llama.svg)](https://aipedia.wiki/tools/llama/)

Badge value auto-updates if the editorial score changes. Attribution via the link is required.

Cite this page For journalists, researchers, and bloggers

News writers

According to aipedia.wiki Editorial at aipedia.wiki (https://aipedia.wiki/tools/llama/)

APA

aipedia.wiki Editorial. (2026). Llama: Editorial Review. aipedia.wiki. Retrieved July 2, 2026, from https://aipedia.wiki/tools/llama/

MLA 9

aipedia.wiki Editorial. "Llama: Editorial Review." aipedia.wiki, 2026, https://aipedia.wiki/tools/llama/. Accessed July 2, 2026.

Chicago

aipedia.wiki Editorial. 2026. "Llama: Editorial Review." aipedia.wiki. https://aipedia.wiki/tools/llama/.

BibTeX

@misc{llama-editorial-review-2026,
  author = {{aipedia.wiki Editorial}},
  title = {Llama: Editorial Review},
  year = {2026},
  publisher = {aipedia.wiki},
  url = {https://aipedia.wiki/tools/llama/},
  note = {Accessed: 2026-07-02}
}

Spotted an error or want to share your experience with Llama?

Every tool page is re-verified on a recurring cycle, and corrections land faster when readers flag them directly. If you spot a stale fact, a missing capability, or have used Llama and want to share what worked or didn't, the editorial desk reviews every message sent through this form.

Email editorial@aipedia.wiki

Report outdated info Help us keep this page accurate