Observability
This page focuses solely on collecting and visualizing metrics for Semantic Router using Prometheus and Grafanaβdeployment method (Docker Compose vs Kubernetes) is covered in docker-quickstart.md.
1. Metrics & Endpoints Summaryβ
| Component | Endpoint | Notes | 
|---|---|---|
| Router metrics | :9190/metrics | Prometheus format (flag: --metrics-port) | 
| Router health (future probe) | :8080/health | HTTP readiness/liveness candidate | 
| Envoy metrics (optional) | :19000/stats/prometheus | If you enable Envoy | 
Dashboard JSON: deploy/llm-router-dashboard.json.
Primary source file exposing metrics: src/semantic-router/cmd/main.go (uses promhttp).
2. Docker Compose Observabilityβ
Compose bundles: prometheus, grafana, semantic-router, (optional) envoy, mock-vllm.
Key files:
- config/prometheus.yaml
- config/grafana/datasource.yaml
- config/grafana/dashboards.yaml
- deploy/llm-router-dashboard.json
Start (with testing profile example):
CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build
Access:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
Expected Prometheus targets:
- semantic-router:9190
- envoy-proxy:19000(optional)
3. Kubernetes Observabilityβ
This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster.
Namespace β All manifests default to the
vllm-semantic-router-systemnamespace to match the core deployment. Override it with Kustomize if you use a different namespace.
What Gets Installedβ
| Component | Purpose | Key Files | 
|---|---|---|
| Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention | prometheus/(rbac.yaml,configmap.yaml,deployment.yaml,pvc.yaml,service.yaml) | 
| Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | grafana/(secret.yaml,configmap-*.yaml,deployment.yaml,pvc.yaml,service.yaml) | 
| Ingress (optional) | Exposes the UIs outside the cluster | ingress.yaml | 
| Dashboard provisioning | Automatically loads deploy/llm-router-dashboard.jsoninto Grafana | grafana/configmap-dashboard.yaml | 
Prometheus is configured to discover the semantic-router-metrics service (port 9190) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack.
1. Prerequisitesβ
- Deployed Semantic Router workload via deploy/kubernetes/
- A Kubernetes cluster (managed, on-prem, or kind)
- kubectlv1.23+
- Optional: an ingress controller (NGINX, ALB, etc.) if you want external access
2. Directory Layoutβ
deploy/kubernetes/observability/
βββ README.md
βββ kustomization.yaml          # (created in the next step)
βββ ingress.yaml                # optional HTTPS ingress examples
βββ prometheus/
β   βββ configmap.yaml          # Scrape config (Kubernetes SD)
β   βββ deployment.yaml
β   βββ pvc.yaml
β   βββ rbac.yaml               # SA + ClusterRole + binding
β   βββ service.yaml
βββ grafana/
    βββ configmap-dashboard.yaml    # Bundled LLM router dashboard
    βββ configmap-provisioning.yaml # Datasource + provider config
    βββ deployment.yaml
    βββ pvc.yaml
    βββ secret.yaml                 # Admin credentials (override in prod)
    βββ service.yaml
3. Prometheus Configuration Highlightsβ
- Uses kubernetes_sd_configsto enumerate endpoints invllm-semantic-router-system
- Keeps 15 days of metrics by default (--storage.tsdb.retention.time=15d)
- Stores metrics in a PersistentVolumeClaimnamedprometheus-data
- RBAC rules grant read-only access to Services, Endpoints, Pods, Nodes, and EndpointSlices
Scrape configuration snippetβ
scrape_configs:
  - job_name: semantic-router
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
            - vllm-semantic-router-system
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_name]
        regex: semantic-router-metrics
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        regex: metrics
        action: keep
Modify the namespace or service name if you changed them in your primary deployment.
4. Grafana Configuration Highlightsβ
- Stateful deployment backed by the grafana-storagePVC
- Datasource provisioned automatically pointing to http://prometheus:9090
- Dashboard provider watches /var/lib/grafana-dashboards
- Bundled llm-router-dashboard.jsonis identical todeploy/llm-router-dashboard.json
- Admin credentials pulled from the grafana-adminsecret (defaultadmin/adminβ change this!)
Updating credentialsβ
kubectl create secret generic grafana-admin \
  --namespace vllm-semantic-router-system \
  --from-literal=admin-user=monitor \
  --from-literal=admin-password='pick-a-strong-password' \
  --dry-run=client -o yaml | kubectl apply -f -
Remove or overwrite the committed secret.yaml when you adopt a different secret management approach.
5. Deployment Stepsβ
5.1. Create the Kustomizationβ
Create deploy/kubernetes/observability/kustomization.yaml (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router.
5.2. Apply manifestsβ
kubectl apply -k deploy/kubernetes/observability/
Verify pods:
kubectl get pods -n vllm-semantic-router-system
You should see prometheus-... and grafana-... pods in Running state.
5.3. Integration with the core deploymentβ
- 
Deploy or update Semantic Router ( kubectl apply -k deploy/kubernetes/).
- 
Deploy observability stack ( kubectl apply -k deploy/kubernetes/observability/).
- 
Confirm the metrics service ( semantic-router-metrics) has endpoints:kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system
- 
Prometheus target should transition to UP within ~15 seconds. 
5.4. Accessing the UIsβ
Optional Ingress β If you prefer to keep the stack private, delete
ingress.yamlfromkustomization.yamlbefore applying.
- 
Port-forward (quick check) kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
 kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-systemPrometheus β http://localhost:9090, Grafana β http://localhost:3000 
- 
Ingress (production) β Customize ingress.yamlwith real domains, TLS secrets, and your ingress class before applying. Replace*.example.comand configure HTTPS certificates via cert-manager or your provider.
6. Verifying Metrics Collectionβ
- Open Prometheus (port-forward or ingress) β Status βΈ Targets β ensure semantic-routerjob is green.
- Query rate(llm_model_completion_tokens_total[5m])β should return data after traffic.
- Open Grafana, log in with the admin credentials, and confirm the LLM Router Metrics dashboard exists under the Semantic Router folder.
- Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
- Prompt Category counts
- Token usage rate per model
- Routing modifications between models
- Latency histograms (TTFT, completion p95)
 
7. Dashboard Customizationβ
- Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template.
- Update Grafana provisioning (grafana/configmap-provisioning.yaml) to point to alternate folders or add new providers.
- Add additional dashboards by extending grafana/configmap-dashboard.yamlor mounting a different ConfigMap.
- Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters.
8. Best Practicesβ
Resource Sizingβ
- Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days.
- Grafana: start with 500mCPU /1GiRAM; scale replicas horizontally when concurrent viewers exceed a few dozen.
Storageβ
- Use SSD-backed storage classes for Prometheus when retention/window is large.
- Increase prometheus/pvc.yaml(default 20Gi) andgrafana/pvc.yaml(default 10Gi) to match retention requirements.
- Enable volume snapshots or backups for dashboards and alert history.
Securityβ
- Replace the demo grafana-adminsecret with credentials stored in your preferred secret manager.
- Restrict ingress access with network policies, OAuth proxies, or SSO integrations.
- Enable Grafana role-based access control and API keys for automation.
- Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config.
Maintenanceβ
- Monitor Prometheus disk usage; prune retention or scale PVC before it fills up.
- Back up Grafana dashboards or store them in Git (already done through this ConfigMap).
- Roll upgrades separately: update Prometheus and Grafana images via kustomization.yamlpatches.
- Consider adopting the Prometheus Operator (ServiceMonitor+PodMonitor) if you already run kube-prometheus-stack. A sampleServiceMonitoris inwebsite/docs/tutorials/observability/observability.md.
4. Key Metrics (Sample)β
| Metric | Type | Description | 
|---|---|---|
| llm_category_classifications_count | counter | Number of category classification operations | 
| llm_model_completion_tokens_total | counter | Tokens emitted per model | 
| llm_model_routing_modifications_total | counter | Model switch / routing adjustments | 
| llm_model_completion_latency_seconds | histogram | Completion latency distribution | 
| process_cpu_seconds_total/process_resident_memory_bytes | standard | Runtime resource usage | 
Use typical PromQL patterns:
rate(llm_model_completion_tokens_total[5m])
histogram_quantile(0.95, sum by (le) (rate(llm_model_completion_latency_seconds_bucket[5m])))
5. Troubleshootingβ
| Symptom | Likely Cause | Check | Fix | 
|---|---|---|---|
| Target DOWN (Docker) | Service name mismatch | Prometheus /targets | Ensure semantic-routercontainer running | 
| Target DOWN (K8s) | Label/selectors mismatch | kubectl get ep semantic-router-metrics | Align labels or ServiceMonitor selector | 
| No new tokens metrics | No traffic | Generate chat/completions via Envoy | Send test requests | 
| Dashboard empty | Datasource URL wrong | Grafana datasource settings | Point to http://prometheus:9090(Docker) or cluster Prometheus | 
| Large 5xx spikes | Backend model unreachable | Router logs | Verify vLLM endpoints configuration |