API Reference

Llama Stack provides a comprehensive set of APIs for building generative AI applications. All APIs follow OpenAI-compatible standards and can be used interchangeably across different providers.

Core APIs

Inference API

Run inference with Large Language Models (LLMs) and embedding models.

Supported Providers:

Builtin (Single Node)
Ollama (Single Node)
Fireworks (Hosted)
Together (Hosted)
NVIDIA NIM (Hosted and Single Node)
vLLM (Hosted and Single Node)
AWS Bedrock (Hosted)
Cerebras (Hosted)
Groq (Hosted)
SambaNova (Hosted)
PyTorch ExecuTorch (On-device iOS, Android)
OpenAI (Hosted)
Anthropic (Hosted)
Gemini (Hosted)
WatsonX (Hosted)

Agents API

Run multi-step agentic workflows with LLMs, including tool usage, memory (RAG), and complex reasoning.

Supported Providers:

Builtin (Single Node)
Fireworks (Hosted)
Together (Hosted)
PyTorch ExecuTorch (On-device iOS)

Vector IO API

Perform operations on vector stores, including adding documents, searching, and deleting documents.

Supported Providers:

FAISS (Single Node)
SQLite-Vec (Single Node)
Chroma (Hosted and Single Node)
Milvus (Hosted and Single Node)
Postgres (PGVector) (Hosted and Single Node)
Weaviate (Hosted)
Qdrant (Hosted and Single Node)

Files API (OpenAI-compatible)

Manage file uploads, storage, and retrieval with OpenAI-compatible endpoints.

Supported Providers:

Local Filesystem (Single Node)
S3 (Hosted)

Vector Store Files API (OpenAI-compatible)

Integrate file operations with vector stores for automatic document processing and search.

Supported Providers:

FAISS (Single Node)
SQLite-vec (Single Node)
Milvus (Single Node)
ChromaDB (Hosted and Single Node)
Qdrant (Hosted and Single Node)
Weaviate (Hosted)
Postgres (PGVector) (Hosted and Single Node)

Safety API

Apply safety policies to outputs at a systems level, not just model level.

Supported Providers:

Llama Guard (Depends on Inference Provider)
Prompt Guard (Single Node)
Code Scanner (Single Node)
AWS Bedrock (Hosted)

Post Training API

Fine-tune models for specific use cases and domains.

Supported Providers:

Builtin (Single Node)
HuggingFace (Single Node)
TorchTune (Single Node)
NVIDIA NEMO (Hosted)

Eval API

Generate outputs and perform scoring to evaluate system performance.

Supported Providers:

Builtin (Single Node)
NVIDIA NEMO (Hosted)

Telemetry API

Collect telemetry data from the system for monitoring and observability.

Supported Providers:

Builtin (Single Node)

Tool Runtime API

Interact with various tools and protocols to extend LLM capabilities.

Supported Providers:

Brave Search (Hosted)
RAG Runtime (Single Node)

API Compatibility

All Llama Stack APIs are designed to be OpenAI-compatible, allowing you to:

Use existing OpenAI API clients and tools
Migrate from OpenAI to other providers seamlessly
Maintain consistent API contracts across different environments

Getting Started

To get started with Llama Stack APIs:

Choose a Distribution: Select a pre-configured distribution that matches your environment
Configure Providers: Set up the providers you want to use for each API
Start the Server: Launch the Llama Stack server with your configuration
Use the APIs: Make requests to the API endpoints using your preferred client

For detailed setup instructions, see our Getting Started Guide.

Provider Details

For complete provider compatibility and setup instructions, see our Providers Documentation.

API Stability

Llama Stack APIs are organized by stability level:

Stable APIs - Production-ready APIs with full support
Experimental APIs - APIs in development with limited support
Deprecated APIs - Legacy APIs being phased out

OpenAI Integration

For specific OpenAI API compatibility features, see our OpenAI Compatibility Guide.

Core APIs​

Inference API​

Agents API​

Vector IO API​

Files API (OpenAI-compatible)​

Vector Store Files API (OpenAI-compatible)​

Safety API​

Post Training API​

Eval API​

Telemetry API​

Tool Runtime API​

API Compatibility​

Getting Started​

Provider Details​

API Stability​

OpenAI Integration​

Core APIs

Inference API

Agents API

Vector IO API

Files API (OpenAI-compatible)

Vector Store Files API (OpenAI-compatible)

Safety API

Post Training API

Eval API

Telemetry API

Tool Runtime API

API Compatibility

Getting Started

Provider Details

API Stability

OpenAI Integration