Skip to main content

Welcome to OGX

Open-source AI application server. Not just inference routing, the full stack.

OGX composes inference, vector stores, file storage, tool calling, and agentic orchestration into a single OpenAI-compatible server. Use any client, any language, any model. Swap providers without changing application code.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello"}],
)

Choose your path

What you get

APIEndpointDescription
Chat Completions/v1/chat/completionsText and vision inference, streaming, tool calling
Responses/v1/responsesServer-side agentic orchestration with tool calling, MCP integration, and built-in file search (RAG)
Embeddings/v1/embeddingsText embeddings for search and retrieval
Vector Stores/v1/vector_storesManaged document storage and search
Files/v1/filesFile upload and management
Batches/v1/batchesOffline batch processing
Models/v1/modelsModel listing and management
Messages/v1/messagesAnthropic Messages API adapter

Beyond the OpenAI specification, OGX provides Prompts for prompt template management, File Processors for document ingestion, and Connectors for external tool registration.

The Responses API implementation conforms to the Open Responses specification. See the API conformance report for detailed coverage.

A server, not a library

OGX is an HTTP server. Your application talks to a standard API over HTTP. This is a different architectural choice than SDK-level frameworks that abstract at the Python import level.

The consequence: your application is language-agnostic. Write it in Python, Go, TypeScript, or curl. Swap the server without touching application code. Replace the entire inference backend without redeploying your application.

50+ pluggable providers

OGX has a pluggable provider architecture across every API, not just inference.

  • 23 inference providers: Ollama, vLLM, OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Gemini, Vertex AI, NVIDIA NIM, Fireworks, Together AI, Groq, SambaNova, Cerebras, WatsonX, and more
  • 15 vector store providers: FAISS, SQLite-vec, ChromaDB, Qdrant, Milvus, PGVector, Weaviate, Elasticsearch, and more
  • Built-in guardrails support: Responses guardrails call an external OpenAI-compatible moderation endpoint configured by moderation_endpoint
  • 6 tool runtimes: File Search, Brave/Bing/Tavily web search, Wolfram Alpha, MCP

Develop locally with Ollama and FAISS. Deploy to production with vLLM and PGVector. Wrap Bedrock or Vertex without lock-in. Same API surface, different backend.

See the provider documentation for the full list and the provider compatibility matrix for tested feature coverage.

Get started

# One-line install
curl -LsSf https://github.com/ogx-ai/ogx/raw/main/scripts/install.sh | bash

# Or install via pip
pip install ogx

# Start the server
ogx stack run
tip

OGX works with any OpenAI-compatible client. Point your existing code at http://localhost:8321/v1 and you're ready to go.

Quick Start Guide | OpenAI API Compatibility | GitHub

Found an issue with the docs?

If you've found some issue with our documentation, please open up a Bug in our GitHub Issues referencing the page and the problem you are facing. Thank you for your help in improving our documentation!