Claude Code Integration
OGX includes built-in support for connecting Claude Code, Anthropic's AI coding assistant, with a single command. All available models on your OGX server are automatically discovered and mapped to Claude Code's model tiers.
Quick Start
1. Start OGX
# With any provider (examples)
export OPENAI_API_KEY="your-key-here"
ogx run starter
# Or with vLLM
export VLLM_URL="http://localhost:8000/v1"
ogx run starter
# Or with Ollama
export OLLAMA_URL="http://localhost:11434/v1"
ogx run starter
2. Connect Claude Code
ogx connect claude
That's it. Claude Code launches with all your OGX models mapped to Claude's haiku/sonnet/opus tiers.
3. Test it out
# Simple query
claude "What is 2+2?"
# Code generation with file creation
claude "Create a Flask hello world app"
# Multi-turn conversation
claude "Write a quicksort in Rust"
claude "Add documentation and tests"
How It Works
The ogx connect claude command:
- Queries the running OGX server for available models
- Filters out non-LLM models (embeddings, rerankers)
- Maps all discovered LLM models to Claude Code's three tiers (haiku/sonnet/opus)
- Sets the required environment variables (
ANTHROPIC_BASE_URL,ANTHROPIC_AUTH_TOKEN, model tier mappings) - Unsets any Vertex/Bedrock variables that would cause Claude Code to bypass OGX
- Launches Claude Code
ogx connect claude
|
v
GET /v1/models (discover available models)
|
v
Map models to Claude tiers (haiku/sonnet/opus)
|
v
Launch claude with ANTHROPIC_BASE_URL + tier env vars
|
v
Claude Code (connected to OGX)
Claude Code sends requests to the Anthropic Messages API (/v1/messages). OGX implements this API, translating between formats as needed:
What gets translated:
- Messages format: Anthropic → OpenAI format (when provider doesn't support Messages API natively)
- Tool calls: Anthropic
tool_useblocks → OpenAItool_calls - Streaming: OpenAI SSE events → Anthropic format (
message_start,content_block_delta, etc.) - Thinking blocks: Extended thinking support for supported models
Native passthrough (no translation needed):
- Ollama with
/v1/messagessupport - vLLM with Anthropic format support
CLI Reference
ogx connect claude [--model MODEL] [--haiku-model MODEL] [--sonnet-model MODEL]
[--opus-model MODEL] [--host HOST] [--port PORT]
[--print-env] [-- CLAUDE_ARGS...]
Options:
| Flag | Default | Description |
|---|---|---|
--model | First available model | Model ID to map to all three Claude tiers |
--haiku-model | --model value | Model ID for the haiku (fast) tier. Overrides --model |
--sonnet-model | --model value | Model ID for the sonnet (balanced) tier. Overrides --model |
--opus-model | --model value | Model ID for the opus (capable) tier. Overrides --model |
--host | localhost | OGX server host |
--port | 8321 | OGX server port (also reads OGX_PORT env var) |
--print-env | off | Print shell export/unset statements instead of launching Claude Code |
Arguments after -- are forwarded to the claude command.
Model Configuration
Default behavior
With no model flags, ogx connect claude maps all three Claude tiers to the first available LLM model on your OGX server.
Setting a specific model for all tiers
ogx connect claude --model openai/gpt-4o
Setting different models per tier
ogx connect claude \
--haiku-model openai/gpt-4o-mini \
--sonnet-model openai/gpt-4o \
--opus-model openai/o1
Forwarding arguments to Claude Code
# Launch Claude Code in print mode
ogx connect claude -- -p "Write a hello world function"
# Specify a Claude model name directly
ogx connect claude -- --model claude-sonnet-4-5
Using --print-env for shell integration
Instead of launching Claude Code, print the environment variables for manual use:
# Print and eval
eval "$(ogx connect claude --print-env --model openai/gpt-4o)"
claude "Hello world"
Some providers may return errors when Claude Code requests a max_tokens value higher than the model supports (e.g., OpenAI models). In that case, use --model to select a model that supports higher token limits, or use the per-tier flags to route different workloads to different models.
Manual Environment Variable Setup
You can also configure Claude Code manually without the ogx connect claude command. Set these environment variables:
export ANTHROPIC_BASE_URL="http://localhost:8321"
export ANTHROPIC_AUTH_TOKEN="ogx" # Bearer token — no interactive approval needed
# Map Claude model tiers to your backend models
export ANTHROPIC_DEFAULT_HAIKU_MODEL="openai/gpt-4o-mini" # Fast/cheap tier
export ANTHROPIC_DEFAULT_SONNET_MODEL="openai/gpt-4o" # Balanced tier
export ANTHROPIC_DEFAULT_OPUS_MODEL="openai/o1" # Most capable tier
claude "Write a hello world function in Python"
When Claude Code sends a request for claude-haiku-4-5-20251001, OGX routes it to the model specified by ANTHROPIC_DEFAULT_HAIKU_MODEL. The starter distribution automatically registers these aliases across all providers:
# Pre-configured in starter config.yaml
registered_resources:
models:
- model_id: claude-haiku-4-5-20251001
provider_id: "all" # Registers alias across ALL providers
provider_model_id: "auto" # Auto-maps to appropriate model
model_type: llm
Supported Features
Core Capabilities
- ✅ All Messages API features: Multi-turn conversations, system messages, streaming
- ✅ Tool use: File operations, shell commands, code execution (via Claude Code's built-in tools)
- ✅ Extended thinking: Thinking blocks for reasoning transparency
- ✅ Token counting:
/v1/messages/count_tokensendpoint - ✅ Prompt caching: When using providers that support it (Anthropic, Bedrock)
- ✅ Any inference provider: OpenAI, vLLM, Ollama, Fireworks, Together, Groq, Bedrock, etc.
Provider-Specific Features
Different providers have different strengths when used with Claude Code:
| Provider | Native Messages API | Thinking Support | Prompt Caching | Notes |
|---|---|---|---|---|
| OpenAI | ❌ (translated) | ⚠️ (via reasoning) | ❌ | Works well, no translation overhead for responses |
| vLLM | ✅ | ❌ | ❌ | Serves Messages API natively with compatible models |
| Ollama | ✅ | ❌ | ❌ | Serves Messages API natively with compatible models |
| Bedrock, Fireworks, Groq, Together | ❌ (translated) | ❌ | ❌ | Works via OpenAI translation |
Configuration Examples
Using OpenAI Models
# Terminal 1: Start OGX
export OPENAI_API_KEY="sk-..."
ogx run starter
# Terminal 2: Connect Claude Code
ogx connect claude --model openai/gpt-4o
Using vLLM with Qwen Models
# Start vLLM server
vllm serve Qwen/Qwen3-8B --api-key fake
# Terminal 1: Start OGX
export VLLM_URL="http://localhost:8000/v1"
ogx run starter
# Terminal 2: Connect Claude Code
ogx connect claude --model vllm/Qwen/Qwen3-8B
Using Ollama with Llama Models
# Start Ollama and pull a model
ollama serve
ollama pull llama3.3:70b
# Terminal 1: Start OGX
export OLLAMA_URL="http://localhost:11434/v1"
ogx run starter
# Terminal 2: Connect Claude Code
ogx connect claude --model ollama/llama3.3:70b
Using Multiple Providers
Route different Claude tiers to different backend models:
# Terminal 1: Start OGX with multiple providers
export VLLM_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="sk-..."
ogx run starter
# Terminal 2: Connect with per-tier model mapping
ogx connect claude \
--haiku-model vllm/Qwen/Qwen3-8B \
--sonnet-model openai/gpt-4o \
--opus-model openai/o1
With a Remote OGX Server
ogx connect claude --host 192.168.1.100 --port 9000
Prerequisites
- Claude Code must be installed and available in your
PATH. Install it from claude.com/download. - OGX server must be running before connecting Claude Code. Start it with
ogx run starterorogx stack run.
Troubleshooting
"Failed to find 'claude' in PATH"
Install Claude Code following the instructions at claude.com/download.
"Failed to connect to OGX server"
The OGX server is not running or not reachable at the specified host and port. Start it first:
ogx run starter
"Failed to find any LLM models"
The OGX server is running but has no LLM models registered. Check your distribution configuration and ensure at least one inference provider is configured.
Model not found with --model
The specified model ID must match exactly what the OGX server reports. List available models:
curl http://localhost:8321/v1/models | python -m json.tool
max_tokens errors with OpenAI models
Symptom: BadRequestError: max_tokens is too large: 32000. This model supports at most 16384 completion tokens
Explanation: Claude Code requests a max_tokens value based on Claude model limits, which may exceed what the backend model supports. This is common with OpenAI models.
Workaround: Use a model that supports a higher token limit, or use per-tier model mapping to route different workloads appropriately.
Claude Code ignores ANTHROPIC_BASE_URL (manual setup)
Symptom: Claude Code connects directly to Anthropic (or Vertex AI / Bedrock) instead of your OGX server.
Explanation: If CLAUDE_CODE_USE_VERTEX=1, CLAUDE_CODE_USE_BEDROCK=1, or related Vertex/Bedrock environment variables are set, Claude Code bypasses ANTHROPIC_BASE_URL entirely. The ogx connect claude command handles this automatically by unsetting these variables.
Solution (if not using ogx connect claude): Unset these variables before starting Claude Code:
unset CLAUDE_CODE_USE_VERTEX
unset ANTHROPIC_VERTEX_PROJECT_ID
unset CLAUDE_CODE_USE_BEDROCK
unset ANTHROPIC_BEDROCK_SESSION_TOKEN
Slow responses with cloud providers
Symptom: Long latency when using OpenAI, Fireworks, etc.
Explanation: There's a double proxy overhead (Claude Code → OGX → provider). Consider using local providers (vLLM, Ollama) for better performance.
Tool use not working
Symptom: Claude Code can't execute shell commands or file operations
Explanation: Tool execution happens in Claude Code's runtime, not OGX. Ensure Claude Code has proper permissions and your model supports tool use.
Performance Considerations
Latency Breakdown
Total latency = Claude Code overhead + OGX processing + provider API call + translation overhead
- Local providers (vLLM, Ollama): Minimal translation overhead, total latency dominated by inference
- Cloud providers (OpenAI, Groq): Network round-trip is the bottleneck
- Format translation: Adds ~5-20ms depending on message complexity
Optimization Tips
- Use local providers when possible (vLLM, Ollama) to minimize network latency
- Enable prompt caching with providers that support it
- Configure native Messages API support in vLLM/Ollama to skip translation overhead
- Use streaming (enabled by default in Claude Code) for faster perceived response times
Differences from Anthropic Claude
While OGX provides full Messages API compatibility, there are some behavioral differences when using alternative models:
| Feature | Anthropic Claude | Open Models (via OGX) |
|---|---|---|
| Thinking blocks | Native support | Varies by model (GPT-4o has reasoning) |
| Prompt caching | Available | Only if provider supports it |
| Extended context | 200K+ tokens | Depends on model (Qwen3: 32K, Llama3: 128K) |
| Tool use format | Optimized for Claude | Translated to OpenAI format |
| Response quality | Claude-specific | Depends on underlying model |
Advanced Configuration
Custom Model Mappings
If you want more control over how Claude model names map to your providers, you can register models explicitly:
# Start OGX
ogx run starter
# Register models via API (after startup)
curl http://localhost:8321/v1/models \
-H "Content-Type: application/json" \
-d '{
"model_id": "claude-haiku-4-5-20251001",
"provider_id": "vllm",
"provider_model_id": "Qwen/Qwen3-8B",
"model_type": "llm"
}'
Or add to your config.yaml:
registered_resources:
models:
- model_id: claude-haiku-4-5-20251001
provider_id: vllm
provider_model_id: Qwen/Qwen3-8B
model_type: llm
Using with Claude Agent SDK
If you're building custom agents with the Claude Agent SDK, OGX works as a drop-in backend:
from claude_agent_sdk import Agent
agent = Agent(
base_url="http://localhost:8321",
api_key="fake", # Not validated
model="vllm/Qwen/Qwen3-8B",
)
response = agent.send("Write a function to parse CSV files")
For more information about OGX's provider architecture, see Providers Overview. For Anthropic Messages API conformance details, see Anthropic Messages API.