Architecture

Overview

vLLM Cluster Manager runs three host services and a client agent on each GPU node.

Host services

Client agent

Consul provides service discovery so the UI and backend can list connected clients.

Client registers with Consul.
Backend discovers clients and stores state in Postgres.
UI calls the backend API and subscribes to WebSocket streams for logs and status.
On deploy, the backend proxies the request to the client, which creates an isolated venv, installs the requested vLLM version and extra packages, then starts the vLLM server.
The sync loop periodically polls clients for deployment status and GPU metrics, updating the database. If the backend restarts, it rediscovers running deployments from clients automatically.

Service	Default	Purpose
Frontend	5173	Web UI (Vite dev server).
Backend	8000	API + WebSockets.
Consul	47528	Host port mapped to Consul HTTP API (container port 8500).
Postgres	5757	Host port mapped to Postgres (container port 5432).
Client	9000	Client agent HTTP server.