Skip to content

Architecture

Overview

vLLM Cluster Manager runs three host services and a client agent on each GPU node.

Host services
  • Infra: Postgres + Consul via Docker Compose
  • Backend: FastAPI orchestration API
  • Frontend: React + Vite admin dashboard
Client agent
  • Python service that registers with Consul
  • Creates isolated per-deployment venvs with uv
  • Executes vLLM workloads on the node
  • Reports deployment status, version, and GPU metrics

Service discovery

Consul provides service discovery so the UI and backend can list connected clients.

Data flow

  1. Client registers with Consul.
  2. Backend discovers clients and stores state in Postgres.
  3. UI calls the backend API and subscribes to WebSocket streams for logs and status.
  4. On deploy, the backend proxies the request to the client, which creates an isolated venv, installs the requested vLLM version and extra packages, then starts the vLLM server.
  5. The sync loop periodically polls clients for deployment status and GPU metrics, updating the database. If the backend restarts, it rediscovers running deployments from clients automatically.

Ports

Service Default Purpose
Frontend 5173 Web UI (Vite dev server).
Backend 8000 API + WebSockets.
Consul 47528 Host port mapped to Consul HTTP API (container port 8500).
Postgres 5757 Host port mapped to Postgres (container port 5432).
Client 9000 Client agent HTTP server.