Architecture
Overview
Aquila runs three host services and a client agent on each GPU node.
Host services
- Infra: Postgres + Consul via Docker Compose
- Backend: FastAPI orchestration API
- Frontend: React + Vite admin dashboard
Client agent
- Python service that registers with Consul
- Runs each deployment as an official
vllm/vllm-openaiDocker container - Pulls/builds and caches images; reconciles running containers on restart
- Reports deployment status, version, and GPU metrics
Service discovery
Consul provides service discovery so the UI and backend can list connected clients.
Data flow
- Client registers with Consul.
- Backend discovers clients and stores state in Postgres.
- UI calls the backend API and subscribes to WebSocket streams for logs and status.
- On deploy, the backend proxies the request to the client, which pulls the matching
vllm/vllm-openaiimage (building a derived image if extra packages are requested) and starts a vLLM container. - The sync loop periodically polls clients for deployment status and GPU metrics, updating the database. If the backend restarts, it rediscovers running deployments from clients automatically.
Ports
| Service | Default | Purpose |
|---|---|---|
| Frontend | 5173 | Web UI (Vite dev server). |
| Backend | 8000 | API + WebSockets. |
| Consul | 47528 | Host port mapped to Consul HTTP API (container port 8500). |
| Postgres | 5757 | Host port mapped to Postgres (container port 5432). |
| Client | 9000 | Client agent HTTP server. |
Communication flow
graph LR
subgraph Host
FE[Frontend<br>:5173]
BE[Backend<br>:8000]
PG[(Postgres<br>:5757)]
CO[Consul<br>:47528]
end
subgraph "GPU Node"
AG[Client Agent<br>:9000]
VL[vLLM Container<br>:port]
end
FE -->|API + WS| BE
BE -->|SQL| PG
BE -->|discovery| CO
AG -->|register| CO
BE -->|deploy/stop/status| AG
AG -->|start/manage| VL
User -->|dashboard| FE
User -->|/v1 gateway| BE
User -.->|direct| VL
Deployment lifecycle
A deployment moves through these states:
stateDiagram-v2
[*] --> stopped
stopped --> starting : deploy / restart
starting --> loading : container running
loading --> running : health check passes
running --> stopping : stop requested
running --> paused_ram : pause (warm cache)
paused_ram --> running : resume / first request
paused_ram --> stopping : stop requested
stopping --> stopped : clean exit
stopping --> stopped : duration expired
running --> error : crash
loading --> error : timeout / crash
starting --> error : image pull fail
error --> starting : restart
error --> [*] : delete
stopped --> [*] : delete
WebSocket events
The backend pushes live updates to the frontend over a single WebSocket connection (/ws). The frontend reconnects automatically on disconnect. Message format:
| Event type | Trigger | Payload |
|---|---|---|
deployments_changed |
Any deployment status change, create, delete, or settings update | {"type": "deployments_changed"} |
nodes_changed |
Node status, metrics, or configuration update | {"type": "nodes_changed"} |
settings_changed |
Runtime settings updated via the Settings dialog | {"type": "settings_changed"} |
api_keys_changed |
API key created, updated, or deleted | {"type": "api_keys_changed"} |
The frontend uses these events to invalidate its React Query caches and re-fetch the affected data, keeping the dashboard in sync across multiple browser tabs without polling.