Detailed Tutorial

Beyond the Quickstart

This tutorial assumes you've completed the Quickstart and have a running OGX server. We'll build progressively more complex applications using the standard OpenAI SDK.

Chat Completions

The most basic usage - a simple chat conversation:

chat.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

# List available models
models = client.models.list()
for m in models.data:
    print(f"  {m.id} ({m.object})")

# Simple chat
response = client.chat.completions.create(
    model="ollama/llama3.2:3b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about open source"},
    ],
)
print(response.choices[0].message.content)

Streaming

Stream responses token by token for a more interactive experience:

stream.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

stream = client.chat.completions.create(
    model="ollama/llama3.2:3b",
    messages=[{"role": "user", "content": "Explain RAG in 3 sentences"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

Responses API with Tool Calling

The Responses API provides server-side orchestration. The model decides which tools to call, OGX executes them, and feeds results back automatically:

tools.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

response = client.responses.create(
    model="ollama/llama3.2:3b",
    input="What is the weather like in San Francisco?",
    tools=[{
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
            },
            "required": ["location"],
        },
    }],
)

# The model will request a tool call - check the output
for item in response.output:
    print(item)

RAG with Vector Stores

Upload documents, create a vector store, and ask questions. OGX handles chunking, embedding, and retrieval:

rag.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

# Upload a document
file = client.files.create(
    file=open("my-document.pdf", "rb"),
    purpose="assistants",
)
print(f"Uploaded: {file.id}")

# Create a vector store and index the file
vector_store = client.vector_stores.create(
    name="my-docs",
    file_ids=[file.id],
)
print(f"Vector store: {vector_store.id}")

# Ask questions with file search
response = client.responses.create(
    model="ollama/llama3.2:3b",
    input="What are the key points in this document?",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store.id],
    }],
)
print(response.output_text)

Multi-turn Conversations

Use previous_response_id to build multi-turn conversations without managing message history:

conversation.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

# First turn
r1 = client.responses.create(
    model="ollama/llama3.2:3b",
    input="My name is Alice and I'm building a RAG app",
)
print("Assistant:", r1.output_text)

# Second turn - references the first
r2 = client.responses.create(
    model="ollama/llama3.2:3b",
    input="What did I say my name was?",
    previous_response_id=r1.id,
)
print("Assistant:", r2.output_text)

MCP Tools

Connect to any MCP server and use its tools:

mcp.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

response = client.responses.create(
    model="ollama/llama3.2:3b",
    input="List files in the current directory",
    tools=[{
        "type": "mcp",
        "server_label": "filesystem",
        "server_url": "http://localhost:3000/sse",
    }],
)
print(response.output_text)

Switching Providers

The same code works with any backend. Just change the server config:

Ollama
OpenAI

export OLLAMA_URL=http://localhost:11434/v1
uv run ogx stack run starter

export OPENAI_API_KEY=sk-xxx
uv run ogx stack run starter

Your client code stays the same. Just update the model name:

response = client.responses.create(
    model="openai/gpt-4o",  # now using OpenAI
    input="What is OGX?",
)

Running as a Library

You can also use OGX without running a server, directly in your Python process:

library.py
from ogx.core.library_client import OGXAsLibraryClient

client = OGXAsLibraryClient("starter")
client.initialize()

# Use the same OpenAI-compatible interface
response = client.responses.create(
    model="ollama/llama3.2:3b",
    input="Hello from library mode!",
)
print(response.output_text)

Next Steps

Building Applications - RAG, agents, and tools
Providers - all supported backends
API Reference - full endpoint documentation
Deploying - Kubernetes, production setup

Beyond the Quickstart​

Chat Completions​

Streaming​

Responses API with Tool Calling​

RAG with Vector Stores​

Multi-turn Conversations​

MCP Tools​

Switching Providers​

Running as a Library​

Next Steps​