Architecture
Overview
vLLM Cluster Manager runs three host services and a client agent on each GPU node.
Host services
- Infra: Postgres + Consul via Docker Compose
- Backend: FastAPI orchestration API
- Frontend: React + Vite admin dashboard
Client agent
- Python service that registers with Consul
- Creates isolated per-deployment venvs with
uv - Executes vLLM workloads on the node
- Reports deployment status, version, and GPU metrics
Service discovery
Consul provides service discovery so the UI and backend can list connected clients.
Data flow
- Client registers with Consul.
- Backend discovers clients and stores state in Postgres.
- UI calls the backend API and subscribes to WebSocket streams for logs and status.
- On deploy, the backend proxies the request to the client, which creates an isolated venv, installs the requested vLLM version and extra packages, then starts the vLLM server.
- The sync loop periodically polls clients for deployment status and GPU metrics, updating the database. If the backend restarts, it rediscovers running deployments from clients automatically.
Ports
| Service | Default | Purpose |
|---|---|---|
| Frontend | 5173 | Web UI (Vite dev server). |
| Backend | 8000 | API + WebSockets. |
| Consul | 47528 | Host port mapped to Consul HTTP API (container port 8500). |
| Postgres | 5757 | Host port mapped to Postgres (container port 5432). |
| Client | 9000 | Client agent HTTP server. |