Kubernetes Deployment Guide
Deploy OGX and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers deployment using the Kubernetes operator to manage the OGX server with Kind. The vLLM inference server is deployed manually.
Prerequisites
Local Kubernetes Setup
Create a local Kubernetes cluster via Kind:
kind create cluster --image kindest/node:v1.32.0 --name ogx-test
Set your Hugging Face token:
export HF_TOKEN=$(echo -n "your-hf-token" | base64)
Quick Deployment
Step 1: Create Storage and Secrets
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: vllm-models
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Secret
metadata:
name: hf-token-secret
type: Opaque
data:
token: $HF_TOKEN
EOF
Step 2: Deploy vLLM Server
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-server
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: vllm
template:
metadata:
labels:
app.kubernetes.io/name: vllm
spec:
containers:
- name: vllm
image: vllm/vllm-openai:latest
command: ["/bin/sh", "-c"]
args: ["vllm serve meta-llama/Llama-3.2-1B-Instruct"]
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
ports:
- containerPort: 8000
volumeMounts:
- name: llama-storage
mountPath: /root/.cache/huggingface
volumes:
- name: llama-storage
persistentVolumeClaim:
claimName: vllm-models
---
apiVersion: v1
kind: Service
metadata:
name: vllm-server
spec:
selector:
app.kubernetes.io/name: vllm
ports:
- protocol: TCP
port: 8000
targetPort: 8000
type: ClusterIP
EOF
Step 3: Install Kubernetes Operator
Install the OGX Kubernetes operator to manage OGX deployments:
# Install from the latest main branch
kubectl apply -f https://raw.githubusercontent.com/ogx-ai/ogx-k8s-operator/main/release/operator.yaml
# Or install a specific version (e.g., v0.4.0)
# kubectl apply -f https://raw.githubusercontent.com/ogx-ai/ogx-k8s-operator/v0.4.0/release/operator.yaml
Verify the operator is running:
kubectl get pods -n ogx-k8s-operator-system
For more information about the operator, see the ogx-k8s-operator repository.
Step 4: Deploy OGX Server using Operator
Create a OGXDistribution custom resource to deploy the OGX server. The operator will automatically create the necessary Deployment, Service, and other resources.
You can optionally override the default config.yaml using spec.server.userConfig with a ConfigMap (see userConfig spec).
cat <<EOF | kubectl apply -f -
apiVersion: ogx.io/v1alpha1
kind: OGXDistribution
metadata:
name: ogx-vllm
spec:
replicas: 1
server:
distribution:
name: starter
containerSpec:
port: 8321
env:
- name: VLLM_URL
value: "http://vllm-server.default.svc.cluster.local:8000/v1"
- name: VLLM_MAX_TOKENS
value: "4096"
- name: VLLM_API_TOKEN
value: "fake"
# Optional: override config.yaml from a ConfigMap using userConfig
userConfig:
configMap:
name: ogx-config
storage:
size: "20Gi"
mountPath: "/home/lls/.lls"
EOF
Configuration Options:
replicas: Number of OGX server instances to runserver.distribution.name: The distribution to use (e.g.,starterfor the starter distribution). See the list of supported distributions in the operator repository.server.distribution.image: (Optional) Custom container image for non-supported distributions. Use this field when deploying a distribution that is not in the supported list. If specified, this takes precedence overname.server.containerSpec.port: Port on which the OGX server listens (default: 8321)server.containerSpec.env: Environment variables to configure providers:server.userConfig: (Optional) Override the defaultconfig.yamlusing a ConfigMap. See userConfig spec.server.storage.size: Size of the persistent volume for model and data storageserver.storage.mountPath: Where to mount the storage in the container
Note: For a complete list of supported distributions, see distributions.json in the operator repository. Pre-built container images are available on Docker Hub (e.g., llamastack/distribution-starter, llamastack/distribution-postgres-demo). To use a custom or non-supported distribution, set the server.distribution.image field with your container image instead of server.distribution.name.
The operator automatically creates:
- A Deployment for the OGX server
- A Service to access the server
- A PersistentVolumeClaim for storage
- All necessary RBAC resources
Check the status of your deployment:
kubectl get ogxdistribution
kubectl describe ogxdistribution ogx-vllm
Step 5: Test Deployment
Wait for the OGX server pod to be ready:
# Check the status of the OGXDistribution
kubectl get ogxdistribution ogx-vllm
# Check the pods created by the operator
kubectl get pods -l app.kubernetes.io/instance=ogx-vllm
# Wait for the pod to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/instance=ogx-vllm --timeout=300s
Get the service name created by the operator (it typically follows the pattern <ogxdistribution-name>-service):
# List services to find the service name
kubectl get services | grep ogx
# Port forward and test (replace SERVICE_NAME with the actual service name)
kubectl port-forward service/ogx-vllm-service 8321:8321
In another terminal, test the deployment:
ogx-client --endpoint http://localhost:8321 inference chat-completion --message "hello, what model are you?"
Troubleshooting
vLLM Server Issues
Check vLLM pod status:
kubectl get pods -l app.kubernetes.io/name=vllm
kubectl logs -l app.kubernetes.io/name=vllm
Test vLLM service connectivity:
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://vllm-server:8000/v1/models
OGX Server Issues
Check OGXDistribution status:
# Get detailed status
kubectl describe ogxdistribution ogx-vllm
# Check for events
kubectl get events --sort-by='.lastTimestamp' | grep ogx-vllm
Check operator-managed pods:
# List all pods managed by the operator
kubectl get pods -l app.kubernetes.io/name=ogx
# Check pod logs (replace POD_NAME with actual pod name)
kubectl logs -l app.kubernetes.io/name=ogx
Check operator status:
# Verify the operator is running
kubectl get pods -n ogx-operator-system
# Check operator logs if issues persist
kubectl logs -n ogx-operator-system -l control-plane=controller-manager
Verify service connectivity:
# Get the service endpoint
kubectl get svc ogx-vllm-service
# Test connectivity from within the cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://ogx-vllm-service:8321/v1/health
Related Resources
- Deployment Overview - Overview of deployment options
- Distributions - Understanding OGX distributions
- Configuration - Detailed configuration options
- OGX Operator - Overview of ogx kubernetes operator
- OGXDistribution - API Spec of the ogx operator Custom Resource.