Developer reference · 2026

The AI Developer Stack // Complete Landscape

Every tool, layer by layer — with explanations, examples, real links, and decision guides

9 layers 50+ tools Click any layer to expand
1Foundation modelsOpenAI · Anthropic · Google · Meta · Mistral · DeepSeek
2SDKsopenai · anthropic · google-generativeai · mistralai
3Model abstractionLiteLLM · OpenRouter · AWS Bedrock · Azure OpenAI · Vertex AI
4OrchestrationLangChain · LlamaIndex · Haystack · DSPy
5Agent frameworksLangGraph · AutoGen · CrewAI · OpenAI Agents · Claude Code · Smolagents
6Memory & storagePinecone · Weaviate · Qdrant · pgvector · Mem0 · Zep
7Tooling & MCPMCP · Function calling · Toolhouse
8Observability & evalsLangfuse · LangSmith · Braintrust · Helicone · Arize · Traceloop
9App platformsVercel AI SDK · Streamlit · Gradio · Flowise · Dify
1
Foundation models — the brain
The actual AI. Everything else is infrastructure around these.
Foundation models are large neural networks trained on massive datasets. They understand and generate text (and more). You don't build these — you call them via API. Choosing a model is like choosing a database engine: the right one depends on your cost, latency, capability, and privacy requirements. You interact with them through layers 2–3.
Models & providers
GPT-4o / o3openai.com ↗
OpenAI's flagship. GPT-4o is fast and multimodal (text+image+audio). o3 excels at deep reasoning and complex math. Industry benchmark that others are measured against.
Hosted
Claude 3.5 / Claude 4anthropic.com ↗
Anthropic's models. Exceptional at long documents, nuanced reasoning, code, and following complex instructions. Largest context window in the industry. Strong safety properties.
Hosted
Google DeepMind's model. Best-in-class multimodal (video, audio, image). 1M+ token context. Tightly integrated with Google ecosystem (Search, Drive, Workspace).
Hosted
Meta's open-weight models. You download and run them yourself — no API fees, full data privacy. Run on your own GPU or via Ollama locally. Strong for fine-tuning on your own data.
Open sourceSelf-hostable
Mistral / Mixtralmistral.ai ↗
European AI lab. Efficient, fast, open-weight models. Mixtral uses a "mixture of experts" architecture — very capable at fraction of compute cost. GDPR-friendly European provider.
Open sourceAPI available
DeepSeek R2deepseek.com ↗
Chinese lab with exceptional open-weight models. R2 rivals GPT-4 on coding and math benchmarks at a fraction of the cost. Controversial for data privacy but technically impressive.
Open source
Alibaba's open-weight model family. Strong multilingual performance — especially Chinese. Available in many sizes from 0.5B to 72B parameters.
Open source
Not a model, but a tool to run open models locally. One command: ollama run llama4. Local = free, private, offline. Essential for running Llama, Mistral, DeepSeek on your machine.
Open sourceFree
How to choose a model
Cost sensitive? Use Llama 4 or Mistral locally via Ollama (free) · or GPT-4o-mini / Claude Haiku for cheap API calls
Best quality? Claude 4 Opus or GPT-4o or o3 for complex tasks
Data privacy required? Self-host Llama 4 or Mistral — your data never leaves your servers
Long documents? Claude (200K+ context) or Gemini (1M+ context)
Multimodal (images/video/audio)? Gemini 2.0 or GPT-4o
Fine-tuning on your data? Open-weight models: Llama 4, Mistral, Qwen
When you'd use this layer directly
Chatbot or assistant Text classification Document summarization Code generation Translation Data extraction from text
2
SDKs — direct API access
Official thin wrappers. You talk directly to one provider.
SDKs are official client libraries that handle authentication, request formatting, retries, streaming, and error handling for a specific provider. They're thin — they don't add logic, just make the HTTP calls clean. Most developers start here before graduating to orchestration frameworks when their needs grow.
Official SDKs
openai (Python)GitHub ↗
The most widely used AI SDK. Its interface has become the de-facto standard that other SDKs copy. pip install openai
Official
openai (JS/TS)GitHub ↗
Node.js and browser-compatible. Full TypeScript types. npm install openai. Supports streaming with async iterators.
Official
anthropic (Python)GitHub ↗
Clean SDK for Claude models. Supports streaming, vision, tool use, and document processing. pip install anthropic
Official
anthropic (JS/TS)GitHub ↗
Node.js SDK for Claude. npm install @anthropic-ai/sdk. Works in edge runtimes (Cloudflare Workers, Vercel Edge).
Official
google-generativeaiGitHub ↗
Python SDK for Gemini. Handles multimodal inputs natively. pip install google-generativeai
Official
mistralaiGitHub ↗
SDK for Mistral and Mixtral models. OpenAI-compatible interface so migration is easy. pip install mistralai
Official
Code example — basic chat completion
# Anthropic import anthropic client = anthropic.Anthropic() msg = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Explain RAG in one paragraph"}] ) print(msg.content[0].text) # OpenAI — nearly identical interface from openai import OpenAI client = OpenAI() resp = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Explain RAG in one paragraph"}] ) print(resp.choices[0].message.content)
Difference from layer 3
Layer 2 SDKs are provider-specific — if you switch from OpenAI to Anthropic you rewrite your code. Layer 3 (model abstraction) gives you a unified interface so you can swap providers with one config change. Start with a layer 2 SDK, upgrade to layer 3 when you need flexibility.
Use this layer when
Simple scripts and prototypes You're committed to one provider Learning the API Minimal dependencies needed
3
Model abstraction — any model, one SDK
Provider-agnostic routing, cost optimization, and failover.
These tools sit between your code and the model providers. They normalize different API formats into one consistent interface, letting you switch models without code changes. They also add cost tracking, fallback logic, caching, and rate limit handling. Critical for production systems where you need reliability and cost control.
Tools
The Swiss army knife of model abstraction. Supports 100+ models with an OpenAI-compatible interface. Add a proxy server to get centralized logging, budgets, and rate limits across your whole team. pip install litellm
Open sourceProxy available
API gateway as a service. Routes to 200+ models from one API key. Automatically routes to cheapest/fastest provider. Useful when you don't want to manage multiple API keys. Pay-as-you-go.
Hosted
AWS-managed foundation model service. Access Claude, Llama, Mistral, Titan through AWS IAM. Best for teams already in AWS — data stays in your VPC, enterprise compliance, easy integration with S3/Lambda.
AWS managed
OpenAI models hosted on Microsoft Azure. Enterprise SLA, HIPAA/SOC2 compliance, private networking, no data used for training. Ideal for regulated industries (healthcare, finance, government).
Azure managed
Google Cloud's ML platform. Run Gemini and open-weight models. Tight integration with BigQuery and Google Workspace. Good for teams already on GCP.
GCP managed
Together AItogether.ai ↗
Fast inference for open-source models (Llama, Mistral, FLUX). Excellent for fine-tuned model deployment. Often cheaper than calling closed models.
Hosted
LiteLLM code example — switch providers with one line
import litellm # Call any model with the same code response = litellm.completion( model="gpt-4o", # swap to "claude-sonnet-4-20250514" or "ollama/llama4" messages=[{"role": "user", "content": "Hello!"}] ) # Fallback: try GPT-4o, then Claude if it fails response = litellm.completion( model="gpt-4o", messages=[...], fallbacks=["claude-sonnet-4-20250514", "mistral/mistral-large"] )
When to use vs layer 2
Use layer 2 SDK if: you're prototyping, one provider is enough, minimal deps matter.
Use layer 3 abstraction if: you want to A/B test models, need failover, manage costs across providers, or your team uses multiple models.
4
Orchestration — chains & pipelines
Connect LLMs with data sources, tools, and multi-step logic.
Orchestration frameworks help you build complex LLM workflows where each step feeds into the next. The canonical use case is RAG (Retrieval-Augmented Generation): embed user documents → store in vector DB → at query time, retrieve relevant chunks → inject into prompt → get grounded answer. These frameworks handle the plumbing so you write business logic, not boilerplate.
Tools
The most popular framework — massive ecosystem with 100s of integrations (databases, APIs, document loaders). Has a steep learning curve but huge community. Best for RAG, chains, and document Q&A. Python and JS.
Open source
Data-first framework. Better than LangChain for indexing complex document structures (PDFs, databases, APIs). Its data connectors and query engines are more mature. Use when your app is primarily about querying your own data.
Open source
Production-grade pipelines combining traditional search (BM25, Elasticsearch) with LLMs. Best when you need hybrid retrieval — keyword + semantic search. More battle-tested in enterprise search scenarios.
Open source
A different paradigm — instead of writing prompts, you define what you want (inputs/outputs) and DSPy automatically optimizes the prompts. Treats prompts like model weights to be tuned. Best for teams doing systematic prompt engineering.
Open source
Structured outputs from LLMs using Pydantic models. Define a schema, get back validated Python objects. Incredibly useful for data extraction and classification tasks.
Open source
Semantic KernelGitHub ↗
Microsoft's orchestration SDK. C#, Python, and Java. Best for enterprise .NET shops integrating AI into existing Microsoft stack (Azure, Office 365).
Open source
RAG architecture (the main use case)
1. Index time: Load documents → chunk text → embed with model → store vectors in DB (layer 6)
2. Query time: Embed user question → search DB for similar chunks → inject chunks into prompt → LLM answers grounded in your data

Result: The LLM "knows" your private documents without fine-tuning. Cost: ~$0.001 per query.
LangChain vs LlamaIndex
LangChain = better for complex chains, agent logic, wide integrations, and general LLM apps.
LlamaIndex = better for pure data indexing & retrieval tasks — it has richer document parsing, query routing, and evaluation tools.
Many production apps use both together.
Use cases
Customer support bot on your docs Internal knowledge base Q&A Contract / legal document analysis Multi-step data extraction pipeline Code search across a codebase
5
Agent frameworks — plan, act, loop
AI that autonomously reasons, uses tools, and keeps going until the task is done.
Agents go beyond pipelines: instead of a fixed sequence of steps, the LLM decides what to do next. It can call tools (search web, run code, read files, call APIs), observe results, and plan the next action — looping until the task is complete. This is where AI stops being a Q&A system and starts being an autonomous worker.
Frameworks
Models agent logic as a graph with nodes (actions) and edges (transitions). Supports cycles (loops), human-in-the-loop checkpoints, and persistent state. By the LangChain team. Best for complex, stateful agents with branching logic.
Open source
Microsoft's multi-agent framework. Agents talk to each other to solve tasks — a "Planner" agent delegates to "Coder" and "Reviewer" agents. Excellent for tasks that benefit from adversarial debate between agents.
Open source
Role-based agent teams. Define agents with roles (Researcher, Writer, Editor) and tasks. Agents collaborate sequentially or in parallel. Best for content pipelines and research workflows.
Open source
OpenAI Agents SDKGitHub ↗
OpenAI's official framework. Key concept: "handoffs" — one agent hands a conversation to a more specialized agent. Built-in tracing, guardrails, and tool use. Works best with OpenAI models but open to others.
OfficialOpen source
Anthropic's CLI coding agent. Runs in your terminal with full access to your codebase. Can read/write files, run tests, execute commands, and open PRs. Best coding agent available for large-scale autonomous code tasks.
By Anthropic
HuggingFace's minimal agent library. Agents write Python code to call tools (instead of JSON). Very lightweight — good starting point to understand agent mechanics without framework overhead.
Open source
Agent framework built on Pydantic for strong typing and validation. Dependency injection for tools. Growing fast — clean, Pythonic API. Good for production agents where type safety matters.
Open source
AgentKit (Coinbase)GitHub ↗
Agents that can interact with blockchains and crypto wallets. Niche but important for Web3 use cases — autonomous DeFi agents, NFT management, cross-chain operations.
Open source
Orchestration (layer 4) vs Agents (layer 5)
Orchestration (layer 4) = predetermined steps. You define the flow. LLM fills in the blanks.
Example: always retrieve → always summarize → always format

Agents (layer 5) = the LLM decides the steps. It can loop, branch, retry, call different tools.
Example: "research this topic" → agent decides to search, then read papers, then write summary, then fact-check
Use cases
Autonomous research assistant Code review & refactoring bot Customer support that takes actions DevOps automation agent Data analysis with Python code execution Web browsing and data scraping
6
Memory & storage — beyond the context window
Give your AI a long-term memory and the ability to search knowledge semantically.
LLMs have a fixed context window — they forget everything between sessions. Vector databases solve this by storing text as mathematical embeddings (lists of numbers that capture meaning), enabling semantic search: "find content similar in meaning to this query." This is the storage backbone of RAG systems and the key to giving agents long-term memory.
Vector databases
The most popular managed vector DB. Easy to start, scales to billions of vectors, fast filtered search. Pay-as-you-go. No infrastructure to manage. Best choice for most production apps that don't need self-hosting.
Managed
Open-source, self-hostable. Multi-modal (store images, audio, text together). Built-in vectorization modules — connect your OpenAI/Cohere embedding model directly. Strong GraphQL API.
Open sourceCloud available
Rust-based — extremely fast and memory-efficient. Rich filtering (combine vector search with exact field filters). WASM client runs in browser. Growing favorite for performance-critical apps.
Open sourceCloud available
The easiest vector DB to get started with. Runs in-memory or persisted locally. pip install chromadb and you're running. Best for prototyping and small-scale RAG apps.
Open sourceLocal / free
pgvectorGitHub ↗
Adds vector search to your existing Postgres database. No new infra — if you already have Postgres, you get semantic search. Not as fast as dedicated vector DBs at massive scale, but perfect for most apps.
Open sourceFree
Open-source, cloud-native vector DB built for scale. Handles billions of vectors. More complex to operate than Qdrant/Weaviate but built for enterprise-scale use cases.
Open source
Agent memory layers
Intelligent memory layer for AI apps. Automatically extracts facts from conversations ("user prefers Python", "team uses AWS") and surfaces them in future sessions. Think of it as the LLM's personal notepad.
Open source
Conversation memory with automatic summarization and entity extraction. Stores long conversation history efficiently, extracts facts about users, and makes past context searchable. Built for production agent apps.
Open sourceCloud available
How vector search works
1. Embed: Convert text → vector: "The cat sat on the mat" → [0.23, -0.11, 0.87, ...] (1536 numbers)
2. Store: Save vectors + original text in DB
3. Search: Embed query → find closest vectors (cosine similarity) → return original texts

Why it's powerful: "automobile" and "car" are near each other in vector space, even though the words differ. Traditional keyword search would miss this.
How to choose
Prototype → Chroma (local, instant) Already have Postgres → pgvector Production, no ops → Pinecone Self-host, high perf → Qdrant Agent memory → Mem0 or Zep
7
Tooling & MCP — external integrations
Standardized protocols for LLMs to call tools, APIs, and services.
For an agent to be useful, it needs to interact with the real world — search the web, read files, call APIs, query databases. "Tooling" is how LLMs do this. Function calling (built into most model APIs) lets you define tools as JSON schemas. MCP is Anthropic's open standard that takes this further — a universal protocol so any AI can use any tool, like USB-C for AI integrations.
Protocols & standards
Model Context Protocol — Anthropic's open standard, now widely adopted. Define a "server" that exposes tools (functions) and resources (data). Any MCP-compatible AI client (Claude, Cursor, Zed) can then use your server. Huge and growing ecosystem of community-built MCP servers.
By AnthropicOpen standard
Function callingOpenAI docs ↗
Native tool use built into OpenAI and Anthropic APIs. Define tools as JSON schemas, the model decides when to call them and fills in parameters. Simpler than MCP but not standardized — each provider has slightly different syntax.
Native API feature
Popular MCP servers (community-built)
Filesystem MCPGitHub ↗
Read/write files on your local filesystem. Essential for coding agents that need to edit source files.
Open source
GitHub MCPGitHub ↗
Search repos, read issues, create PRs, manage branches — full GitHub API via MCP.
Open source
Brave Search MCPGitHub ↗
Real-time web search via Brave Search API. Gives agents access to current information.
Open source
Postgres MCPGitHub ↗
Query any Postgres database with natural language. The model writes SQL and executes it safely.
Open source
Managed tool registry — deploy pre-built tools (web search, code execution, email, calendar) with one line of code. Pay per call. Fastest way to give agents capabilities without building tooling yourself.
Hosted
150+ pre-built integrations (Slack, Gmail, GitHub, Salesforce, Jira, HubSpot...) exposed as tools for agents. Handles OAuth and authentication automatically.
Hosted
MCP vs function calling
Function calling = defined per-request in your code, works with one model's API, not shareable.
MCP = define tools once in a server, any compatible client/AI can use them, reusable across teams and apps.

Think: function calling is a local script, MCP is a published API.
8
Observability & evals — monitor & debug
Trace every LLM call. Evaluate quality. Track costs. Catch regressions.
LLM apps are non-deterministic — the same prompt can produce different outputs. Without observability, debugging is guesswork. These tools record every call (prompts, outputs, latency, cost, model used), let you score outputs for quality, run automated evaluations, and alert you when something degrades in production. Critical before going live.
Tracing & evaluation platforms
Langfuse ⭐langfuse.com ↗
Open-source, self-hostable alternative to LangSmith. Tracing, prompt management (version prompts like code), user feedback collection, cost tracking by model/user/session, dataset management, and automated evals. The best choice if you care about data privacy or want to avoid vendor lock-in.
Open sourceCloud available
Observability platform by LangChain. Tightly integrated — one decorator traces your whole LangChain app. Dataset management, regression testing, and human annotation tools. Best if your team is already on LangChain.
Hosted (SaaS)
Eval-first platform. Best-in-class for running systematic evaluations — define test datasets, scoring functions (LLM-as-judge, exact match, semantic similarity), and track scores over time. Good for teams doing rigorous model/prompt comparison.
Hosted
Proxy-based — add one URL change and get full logging. Zero-code integration, response caching (saves cost), rate limiting, prompt templates. Simplest setup of any observability tool.
Open sourceCloud available
Arize / Phoenixphoenix.arize.com ↗
Broader ML observability. Phoenix is the open-source version — runs locally for offline evaluation. Arize is the enterprise cloud platform. Good when you have a mix of traditional ML and LLM models to monitor.
Phoenix: open sourceArize: hosted
Traceloop / OpenLLMetrytraceloop.com ↗
OpenTelemetry-based LLM tracing. If your team uses OTel for existing services, this extends it to LLM calls. Standards-based approach — works with any OTel-compatible backend (Datadog, Jaeger, Grafana).
Open source
Confident AI (DeepEval)confident-ai.com ↗
Open-source evaluation framework (DeepEval) with 14+ built-in metrics: hallucination detection, answer relevancy, faithfulness, toxicity, bias. Run evals in CI like unit tests.
Open source
Weights & Biases (Weave)wandb.ai ↗
W&B's LLM tracing product. If your team already uses W&B for ML experiment tracking, Weave extends it to LLM calls and evals. Unified view of training and inference.
Hosted
What to track in production
Latency — p50/p95/p99 per model and endpoint
Cost — token usage, spend per user/feature/day
Quality — human ratings, automated eval scores, thumbs up/down
Errors — rate limit hits, model refusals, timeouts
Prompts — version control which prompt produced which output
Langfuse vs LangSmith
Langfuse = open-source (self-host = full data ownership), works with any framework/SDK, prompt management is a first-class feature, free tier generous.
LangSmith = better LangChain integration (automatic tracing), stronger dataset tooling, SaaS only.
Choose Langfuse if data privacy matters or you're not on LangChain.
9
App platforms — full-stack deployment
Ship complete AI-powered applications to users.
Once you've built the logic (layers 1–8), you need to expose it to users. App platforms handle UI, deployment, hosting, streaming, and user experience. They range from Python-only quick dashboards (Streamlit) to production-grade React streaming frameworks (Vercel AI SDK) to no-code visual builders (Flowise, Dify).
Frontend & deployment
Vercel AI SDKsdk.vercel.ai ↗
The best-in-class SDK for AI-powered Next.js apps. Streaming responses, generative UI, multi-step agents in React. Unified API for 20+ providers. If you're building a production web app, start here. npm install ai
Open source
Turn Python scripts into web apps in minutes. No HTML/CSS/JS needed. Best for internal tools, demos, and data dashboards. Streamlit Community Cloud hosts for free. Not suited for public-facing production apps at scale.
Open sourceFree hosting
Like Streamlit but optimized for ML demos — image/audio/video inputs built-in. Hugging Face Spaces hosts Gradio apps for free. One-click shareable links. The standard for ML model demos in the research community.
Open sourceFree hosting
FastAPI + custom UIfastapi.tiangolo.com ↗
For teams that want full control. FastAPI serves the LLM backend, your React/Vue/Svelte app is the frontend. More work but complete flexibility. The right choice for production apps with custom UX requirements.
Open source
No-code / low-code builders
Drag-and-drop builder for LangChain flows. Build RAG pipelines and chatbots visually. Self-hostable. Great for non-engineers or rapid prototyping — connect nodes like LLM → Memory → Retriever visually.
Open source
Full platform — visual workflow builder, RAG, agent apps, API publishing, monitoring, all in one. More complete than Flowise. Self-hostable. Used by teams who want a product without writing code, but still want power-user features.
Open sourceCloud available
Workflow automation with AI nodes. Like Zapier but self-hostable and with LLM steps. Connect 400+ apps + AI in visual workflows. Great for automating business processes with AI without coding.
Open sourceCloud available
Visual chatbot builder with LLM integration. Good for customer-facing chatbots that need to follow specific conversation flows. Integrates with WhatsApp, Slack, web chat, Telegram.
Open sourceCloud available
Which to pick
Vercel AI SDK → production web app, React, TypeScript, streaming UI required
Streamlit / Gradio → internal tool, demo, data app, Python team, ship in hours
FastAPI + custom → full product, unique UX, need complete control
Flowise / Dify → non-engineer building an app, or rapid prototype without code
n8n → automating workflows that include AI steps (not primarily a chat UI)
Quick decision guide
Just call one model?
OpenAI SDK / Anthropic SDK (L2)
Switch models / compare costs?
LiteLLM or OpenRouter (L3)
RAG on your own documents?
LlamaIndex or LangChain + Chroma/pgvector (L4+L6)
Build an autonomous agent?
LangGraph or OpenAI Agents SDK (L5)
Multi-agent system?
AutoGen or CrewAI (L5)
Agent needs external tools?
MCP + community servers (L7)
Data privacy / self-host?
Llama 4 via Ollama + Qdrant + Langfuse (L1+L6+L8)
Monitor production quality?
Langfuse (open) or LangSmith (L8)
Ship Python app fast?
Streamlit or Gradio (L9)
Production web app?
FastAPI backend + Vercel AI SDK (L9)
No-code AI app?
Dify or Flowise (L9)
Automate business workflows?
n8n with AI nodes (L9)