Welcome to OGX

Open-source AI application server. Not just inference routing, the full stack.

OGX composes inference, vector stores, file storage, tool calling, and agentic orchestration into a single OpenAI-compatible server. Use any client, any language, any model. Swap providers without changing application code.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)

Choose your path

Quickstart

Install and run OGX in under 5 minutes

API Reference

OpenAI-compatible API endpoints and usage

Providers

Inference, vector store, file, and tool providers

Core Concepts

Architecture, APIs, distributions, and resources

Building Applications

RAG, agents, tools, evals, and more

Distributions

Pre-packaged deployment configurations

What you get

API	Endpoint	Description
Chat Completions	`/v1/chat/completions`	Text and vision inference, streaming, tool calling
Responses	`/v1/responses`	Server-side agentic orchestration with tool calling, MCP integration, and built-in file search (RAG)
Embeddings	`/v1/embeddings`	Text embeddings for search and retrieval
Vector Stores	`/v1/vector_stores`	Managed document storage and search
Files	`/v1/files`	File upload and management
Batches	`/v1/batches`	Offline batch processing
Models	`/v1/models`	Model listing and management
Messages	`/v1/messages`	Anthropic Messages API adapter

Beyond the OpenAI specification, OGX provides Prompts for prompt template management, File Processors for document ingestion, and Connectors for external tool registration.

The Responses API implementation conforms to the Open Responses specification. See the API conformance report for detailed coverage.

A server, not a library

OGX is an HTTP server. Your application talks to a standard API over HTTP. This is a different architectural choice than SDK-level frameworks that abstract at the Python import level.

The consequence: your application is language-agnostic. Write it in Python, Go, TypeScript, or curl. Swap the server without touching application code. Replace the entire inference backend without redeploying your application.

50+ pluggable providers

OGX has a pluggable provider architecture across every API, not just inference.

23 inference providers: Ollama, vLLM, OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Gemini, Vertex AI, NVIDIA NIM, Fireworks, Together AI, Groq, SambaNova, Cerebras, WatsonX, and more
15 vector store providers: FAISS, SQLite-vec, ChromaDB, Qdrant, Milvus, PGVector, Weaviate, Elasticsearch, and more
Built-in guardrails support: Responses guardrails call an external OpenAI-compatible moderation endpoint configured by moderation_endpoint
6 tool runtimes: File Search, Brave/Bing/Tavily web search, Wolfram Alpha, MCP

Develop locally with Ollama and FAISS. Deploy to production with vLLM and PGVector. Wrap Bedrock or Vertex without lock-in. Same API surface, different backend.

See the provider documentation for the full list and the provider compatibility matrix for tested feature coverage.

Get started

# One-line install
curl -LsSf https://github.com/ogx-ai/ogx/raw/main/scripts/install.sh | bash

# Or install via pip
pip install ogx

# Start the server
ogx stack run

tip

OGX works with any OpenAI-compatible client. Point your existing code at http://localhost:8321/v1 and you're ready to go.

Quick Start Guide | OpenAI API Compatibility | GitHub

Found an issue with the docs?

If you've found some issue with our documentation, please open up a Bug in our GitHub Issues referencing the page and the problem you are facing. Thank you for your help in improving our documentation!

Choose your path​

Quickstart

API Reference

Providers

Core Concepts

Building Applications

Distributions

What you get​

A server, not a library​

50+ pluggable providers​

Get started​

Found an issue with the docs?​