API Reference
Llama Stack implements the OpenAI API and organizes endpoints by stability level. Use any OpenAI-compatible client to access these APIs.
Stable APIs
/v1/chat/completions, /v1/completions, /v1/embeddingsInference
Chat completions, text completions, and embeddings
/v1/responsesResponses
Agent orchestration with tool use and multi-turn
/v1/modelsModels
Model listing and management
/v1/filesFiles
File upload and management
/v1/vector_storesVector IO
Document storage and semantic search
/v1/batchesBatches
Offline batch processing
/v1/moderationsSafety
Content safety via Llama Guard
/v1/toolsTools
Tool listing and management
/v1/conversationsConversations
Conversation state management
/v1/promptsPrompts
Prompt templates and versioning
Experimental APIs
/v1alpha/adminAdmin
Providers, routes, health, and version
/v1alpha/inference/rerankRerank
Document reranking for search relevance
/v1alpha/file_processorsFile Processors
Document ingestion and chunking
/v1alpha/interactionsInteractions
Google Interactions API compatibility layer
/v1beta/connectorsConnectors
External tool and service connectors
Deprecated APIs
These APIs follow semantic versioning and maintain backward compatibility within major versions.