Skip to main content
Reference desk 32 terms Shared language for tools, agents, infrastructure, and business pages.

AI Glossary

Definitions for the model, agent, business, and infrastructure terms used across aipedia.wiki.

32 terms tracked
5 reference lanes
Apr verified 2026
32 terms visible

A

Affiliate Marketing

#

Business terms

Affiliate marketing is earning commission by promoting third-party products or services, with compensation typically tied to sales, clicks, or conversions. In the AI tools ecosystem, this model creates financial incentives that can influence product recommendations and editorial objectivity. AI tool review platforms frequently rely on affiliate revenue from major AI vendors and SaaS suites, making disclosure of these relationships essential for reader trust. See also: Sponsored Content, Disclosure Requirements, Bias in AI Reviews

Agentic AI

#

Agent systems

Agentic AI is an autonomous artificial intelligence system that accomplishes specific goals by reasoning, planning, and executing multi-step actions across tools and systems without continuous human intervention. This capability enables AI to operate proactively in complex, dynamic environments rather than simply responding to prompts or generating content. Claude Opus 4.7 with Computer Use, Gemini 3.1 Pro agents, and GPT-5.5-class OpenAI agents demonstrate agentic capabilities by autonomously breaking down tasks, making contextual decisions, and coordinating across multiple specialized agents to reach defined outcomes. See also: Multi-agent, Workflow Automation, Large Language Model, Autonomous Agent

API

#

Build terms

An API (Application Programming Interface) is a set of rules and protocols that enables software applications to communicate, exchange data, and access features from other systems. APIs enable developers to integrate AI services into apps and workflows by sending programmatic requests for responses. The OpenAI API processes prompts to GPT-5.5-class models; the Claude API handles queries to Claude Opus 4.7. See also: SDK, Tokens, Workflow Automation

ARR

#

Business terms

Annual Recurring Revenue (ARR) is the normalized annual value of predictable subscription revenue from contracts, excluding one-time fees and overages. ARR gauges financial health and growth potential for SaaS companies, including AI tools, enabling accurate forecasting and investor evaluation. For example, ChatGPT reportedly reached $4B ARR by 2026. See also: SaaS, MRR

C

Computer Use

#

Agent systems

Computer Use is a capability in agentic AI systems that enables models to interact directly with computer interfaces by clicking buttons, typing text, and navigating screens. This extends AI agents beyond APIs to control visual UIs and legacy software for desktop automation. Claude Opus 4.7 demonstrates Computer Use by operating browsers and applications through screen observation and mouse actions. See also: Agentic AI, Multi-agent

Context Window

#

Build terms

Context window is the maximum number of tokens a large language model processes at once, including prompts and conversation history, acting as its working memory. Larger windows enable handling of extended documents and sustained dialogues. As of 2026, Claude Opus 4.7 supports 1M tokens, Gemini 3.1 Pro supports very long multimodal contexts, and GPT-5.5-class OpenAI API models support million-token scale contexts separate from ChatGPT plan limits. See also: Tokens, LLM

E

Embedding

#

Build terms

Embedding is a numerical vector representation of text, images, audio, or other data that captures semantic meaning and relationships in multidimensional space. This enables machines to quantify similarity between data points by measuring vector proximity, powering semantic search and AI applications. For example, embeddings for "dog" and "puppy" cluster closely in vector space, while "dog" and "refrigerator" remain distant. See also: Vector Database, RAG

F

Fine-tuning

#

Model terms

Fine-tuning is the process of adapting a pre-trained foundation model by further training it on a task-specific dataset to improve performance on targeted applications. Fine-tuning leverages existing model knowledge to achieve superior results with less data and compute than training from scratch. For example, fine-tuning a current GPT-5 family model on company support tickets can improve customer-service response accuracy. See also: LoRA, Foundation Model, Prompt Engineering

Foundation Model

#

Model terms

A foundation model is a large AI model trained on broad data using self-supervision at scale that adapts to a wide range of downstream tasks. These models form the base for specialized applications, enabling faster and cost-effective development. Examples include GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. See also: LLM, Fine-tuning

G

GEO

#

Business terms

Generative Engine Optimization (GEO) is the practice of structuring content so AI systems like ChatGPT, Claude Opus 4.7, and Gemini 3.1 Pro cite it in generated responses. This shifts visibility from search rankings to direct inclusion in AI-generated answers, making brand representation dependent on LLM synthesis rather than click-through traffic. Content optimization for GEO emphasizes clear structure, authoritative citations, comprehensive topic coverage, and natural language that LLMs can easily extract and reference, distinguishing it fundamentally from traditional SEO's focus on keyword ranking and backlinks. See also: SEO, Answer Engine Optimization, Large Language Model, AI Overviews

H

Hallucination

#

Trust and media

Hallucination is a response generated by an AI model that contains false or misleading information presented confidently as fact. This undermines reliability in critical applications like healthcare, law, and education, where accuracy determines outcomes. For example, a model might claim a current GPT release won two Nobel Prizes, though it won none. See also: RAG, LLM

I

Inference

#

Build terms

Inference is the execution phase where a trained AI model analyzes new data to produce predictions, decisions, or generated outputs without learning anything new. This is where AI delivers real-world value, transforming learned patterns into actionable results at scale. When you send a prompt to Claude Opus 4.7 and receive a response, or when a GPT-5.5-class model generates text, that computational process is inference. Inference differs fundamentally from training: it requires only a forward pass through the model rather than parameter updates, making individual predictions far less computationally demanding than model development. Inference costs represent what users pay for API usage and depend on model size, input/output token length, and underlying hardware. Optimization techniques, including model quantization, prompt caching, and deploying smaller specialized models, have become critical for reducing inference expenses in production environments. See also: Training, Tokens, Latency, API, Quantization

L

Latency

#

Build terms

Latency is the time delay between when an AI system receives an input and generates the corresponding output. This metric directly impacts user experience, with low latency enabling real-time interactions in conversational interfaces and autonomous systems. In Claude Opus 4.7 and GPT-5.5-class models, latency stems from data preprocessing, mathematical computations, data transfer between processing units, and postprocessing, with larger models typically exhibiting higher latency due to increased computational overhead. Reducing latency requires model compression, optimized inference code, hardware acceleration, and lower-precision numerical formats. Streaming responses decreases perceived latency by delivering tokens incrementally rather than waiting for complete generation. See also: Inference, TTS, API, Model Compression

LLM

#

Model terms

Large Language Model (LLM) is a deep learning neural network trained on vast text datasets to understand, generate, and process human-like natural language. LLMs underpin modern AI tools by enabling text generation, summarization, translation, and reasoning at scale. Examples include GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. See also: Foundation Model, Tokens, AI Writing Category

LoRA

#

Model terms

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes pre-trained model weights and injects trainable low-rank decomposition matrices into Transformer layers. It reduces compute and memory needs, enabling smaller teams to customize large models without full retraining. LoRA customizes open-source models like Llama 4 and DeepSeek V3.2 for specific tasks. See also: Fine-tuning, Open Source vs Closed Source

M

MoE (Mixture of Experts)

#

Model terms

MoE (Mixture of Experts) is a machine learning architecture that divides a neural network into specialized sub-networks called experts, with a gating network activating only relevant experts per input for efficiency. This selective activation scales models to billions of parameters while reducing compute costs during training and inference. Mixtral 2 and Grok 4.20 deploy MoE layers to match dense model performance at lower inference expense. See also: LLM, Inference

Multi-agent

#

Agent systems

A multi-agent system is a computational architecture of multiple autonomous AI agents that interact in a shared environment to achieve complex goals difficult for a single agent. Multi-agent systems divide tasks among specialized agents for superior efficiency, scalability, and resilience in production workflows. CrewAI 2 can orchestrate a research agent using an OpenAI model, a writing agent with Claude Opus 4.7, and a review agent for report generation. See also: Agentic AI, Workflow Automation

N

No-code/Low-code

#

Agent systems

No-code and low-code platforms enable building applications using visual drag-and-drop interfaces and pre-built components with minimal or no hand-coding required. They accelerate development for developers and non-technical users, enabling rapid creation of custom software without deep programming expertise. Bubble supports no-code web apps, while Retool provides low-code dashboards integrated with OpenAI APIs. See also: Workflow Automation, n8n, Bubble

O

Open Source vs Closed Source

#

Model terms

Open Source vs Closed Source in AI distinguishes models with publicly available weights, architecture, code, and data from proprietary models where these elements remain confidential and accessible only via API or fee. Open source enables self-hosting, fine-tuning, inspection, and privacy; closed source provides superior performance, updates, security, and ease of integration. Examples include open source Llama 4, DeepSeek V3.2, and Mixtral 2 versus closed source GPT-5.5 and Claude Opus 4.7. See also: Mistral, LoRA, Fine-tuning

P

Prompt Engineering

#

Model terms

Prompt engineering is the process of designing and refining natural language prompts to guide generative AI models, particularly large language models, toward producing accurate and desired outputs. Prompt engineering optimizes AI performance without model retraining, enabling precise control over responses through techniques like few-shot prompting and chain-of-thought reasoning. For example, Claude Opus 4.7 generates step-by-step solutions when prompted with "Think step by step" for complex math problems. See also: LLM, Fine-tuning, Tokens

R

RAG

#

Build terms

Retrieval-Augmented Generation (RAG) is a technique that enables large language models to retrieve relevant information from external knowledge bases before generating responses. RAG grounds outputs in current, domain-specific data to produce accurate responses without retraining the model. For example, Claude Opus 4.7 uses RAG to query a company vector database for employee HR policies during a leave inquiry. See also: Embedding, Vector Database, Hallucination

Reasoning Models

#

Model terms

Reasoning models are large language models trained to perform multi-step logical reasoning, breaking complex problems into chain-of-thought steps for superior accuracy on math, coding, and planning tasks. They enable reliable solutions to challenges beyond standard LLMs' pattern-matching capabilities. Examples include Claude Opus 4.7 Reasoning and Gemini 3.1 Pro Think. See also: LLM, Prompt Engineering

S

SaaS

#

Business terms

SaaS is a cloud computing model where providers host and deliver applications over the internet on a subscription basis, managing all infrastructure and updates. This model enables AI tool users to access compute-intensive services without local installation or maintenance costs. Examples include ChatGPT Plus with GPT-5.5 access and Claude Pro with Claude Opus 4.7 via browser apps. See also: ARR, API, MaaS

SDK

#

Build terms

Software Development Kit (SDK). Collection of tools, libraries, and documentation that simplifies building applications with an API by wrapping calls in language-specific functions. SDKs accelerate development and reduce errors for developers integrating AI services. Examples include Anthropic Python SDK for Claude Opus 4.7 and the OpenAI Node SDK for GPT-5.5-class models; the Claude Agent SDK adds frameworks for autonomous AI agents. See also: API, Agentic AI

SEO

#

Business terms

Search Engine Optimization (SEO) is the practice of improving websites and web pages to increase visibility and organic traffic in unpaid search engine results pages (SERPs). SEO drives targeted users searching for information, products, or services, boosting engagement, brand awareness, and conversions without paid ads. Surfer SEO automates keyword research and on-page analysis for higher rankings. See also: GEO, Surfer SEO

T

Test-Time Compute

#

Model terms

Test-Time Compute allocates additional computational resources during model inference to enhance output quality through techniques like multiple sampling, search, or iterative refinement. This scales performance on complex tasks by trading inference time and hardware for superior accuracy and reasoning. Examples include GPT-5.5-class models allocating extra reasoning tokens and Claude Opus 4.7 using search-like deliberation patterns. See also: Inference, Reasoning Models

Tokens

#

Build terms

Tokens are the discrete units of text that large language models break down and process, representing words, subwords, punctuation, or character combinations. Token count directly determines both computational cost and the maximum input length a model can accept within its context window. For common English text, roughly 750 words equal 1,000 tokens, making token estimation essential for API budgeting and prompt design. See also: Context Window, Tokenization, LLM, API

TTS (Text-to-Speech)

#

Trust and media

TTS (Text-to-Speech) converts written text into spoken audio using speech synthesis technology. Modern AI TTS enables scalable voice content creation for accessibility, audiobooks, virtual assistants, and customer service. ElevenLabs v3 and OpenAI TTS-2 produce human-like speech with emotion and natural pacing. See also: Voice Cloning, ElevenLabs, Voxtral

V

Vector Database

#

Build terms

A vector database stores, indexes, and queries high-dimensional vector embeddings representing unstructured data like text, images, or audio for efficient similarity search. Vector databases enable low-latency semantic retrieval essential for RAG systems and generative AI applications. Pinecone stores embeddings from OpenAI, Anthropic-compatible, or open-weight embedding models for querying relevant passages in enterprise RAG pipelines. See also: Embedding, RAG

Vibe Coding

#

Agent systems

Vibe coding is a software development practice where developers describe tasks in natural language prompts to AI large language models, which generate, refine, and debug code. It accelerates prototyping and experimentation by shifting focus from manual coding to guiding AI outputs. Andrej Karpathy coined the term in February 2025, exemplified by using Claude Opus 4.7 or Cursor 2 to build MVPs from conversational descriptions. See also: Agentic Engineering, Software 2.0, AI Coding Category

Voice Cloning

#

Trust and media

Voice cloning replicates a specific person's voice using AI trained on audio samples to synthesize realistic speech matching their tone, accent, and inflections. This technology enables scalable content creation and accessibility tools while posing risks of fraud and deepfakes without consent safeguards. ElevenLabs PVC v2 clones voices from 30 minutes of audio, while instant methods like those in Claude Opus 4.7 use seconds-long clips. See also: TTS, ElevenLabs

W

Workflow Automation

#

Agent systems

Workflow automation uses software to execute multi-step business processes automatically based on triggers, rules, actions, and logic, minimizing human intervention. It enables faster operations, reduces errors, and frees teams for high-value work. For example, Zapier can call an OpenAI model to generate social posts, then schedule them via Make. See also: No-code/Low-code, Agentic AI, n8n