Not a gateway.
The full stack.
Inference, vector stores, file storage, moderation, tool calling, and agentic orchestration — as a server or a Python library. Pluggable providers, any language, deploy anywhere.
Run as a server or import as a Python library (requires uv)
uvx --from 'ogx[starter]' ogx stack run starter/v1/responsesfrom openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")response = client.responses.create( model="llama-3.3-70b", input="Summarize this repository", tools=[{"type": "web_search"}],)Your tools. Any model.
Configure OGX with any provider — Ollama, vLLM, Bedrock, Azure, or your own. Then point Claude Code, Codex, or OpenCode at it. Same workflow, any model.
Everything your AI app needs. One process.
More than inference routing. OGX composes inference, storage, moderation, and orchestration into a single process — whether you run it as a server or import it as a library. Your agent can search a vector store, call a tool, apply moderation checks, and stream the response. No glue code. No sidecar services.
Inference
/v1/chat/completionsChat Completions/v1/responsesResponses/v1/embeddingsEmbeddings/v1/modelsModels/v1/messagesMessagesAnthropic/v1alpha/interactionsInteractionsGoogleData
/v1/vector_storesVector Stores/v1/filesFiles/v1/batchesBatchesModeration & Tools
/v1/moderationsModerations/v1/toolsTools/v1/connectorsConnectorsServer or library. Your call.
Deploy OGX as an HTTP server for production — any language, any client, standard API. Or import it directly as a Python library for scripts, notebooks, and rapid prototyping with zero network overhead.
Same capabilities either way. Start with the library, graduate to the server when you need multi-language access or independent scaling.
POST /v1/responsesany languageclient.responses.create(...)zero overhead23 inference providers. 13 vector stores. 7 safety backends.
Develop locally with Ollama. Deploy to production with vLLM. Wrap Bedrock or Vertex without lock-in. Same API surface, different backend.
How it works
Your application talks to one process — either an HTTP server or an in-process library client. That process routes to pluggable providers for inference, vector storage, files, moderation, and tools. The composition happens at the OGX level, not in your application code.
Open source
Apache 2.0 licensed. Contributions welcome.