Getting Started
This guide takes you from a clean host to a working cluster with at least one client.
Prerequisites
Host:
- Docker + Docker Compose plugin (configure the docker group so no sudo is required)
- Node.js ≥ 23 + npm
- Python 3.10–3.14
- uv (Python package manager)
Client:
- NVIDIA GPU with a recent driver (nvidia-smi working)
- A container runtime: Docker Engine (add the client user to the docker group) or Podman ≥ 5.4 (rootless works; enable the API socket with systemctl --user enable --now podman.socket; older Podman cannot pass GPUs through its Docker-compatible API). vLLM runs in the official vllm/vllm-openai containers either way.
- The NVIDIA Container Toolkit for GPU access. On Podman nodes, additionally generate CDI specs: sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml.
- Python 3.10–3.14 (for the lightweight client agent)
- uv (Python package manager)
Install uv if you don't already have it:
curl -LsSf https://astral.sh/uv/install.sh | sh
Verify Docker can see the GPUs before installing the client:
docker run --rm --gpus all ubuntu nvidia-smi
Install (uv)
Create and activate a virtual environment:
uv venv
source .venv/bin/activate
uv pip install aquila
Start the host
Foreground (no sudo):
aquila host up --host-ip 0.0.0.0 --host-frontend-port 5173 --host-discover-port 11400
Persistent service (systemd):
aquila host up --service --host-ip 0.0.0.0 --host-frontend-port 5173 --host-discover-port 11400
--host-discover-port sets the discovery port used for clients. Use --host-backend-port to override the backend API port (default 8000).
Start a client
Foreground (no sudo):
aquila client up --host-ip 1.2.3.4 --host-discover-port 11400
Persistent service (systemd):
aquila client up --service --host-ip 1.2.3.4 --host-discover-port 11400
Note
If the client cannot register, verify firewall rules and that the host is reachable from the client on the discovery port.
Stop services
aquila host down
aquila client down
Host data (deployments, nodes, history) survives host down and comes back on the next host up. Pass --purge to host down to also delete the Postgres volume — see Operations → Data persistence.
Verify the UI
Open the UI at http://<host-ip>:<host-frontend-port>.
Common first-run checks: - The UI loads without a network error. - The host shows up as healthy. - The client appears under Nodes within ~30 seconds.
Note
On first run the backend creates a default admin API key and prints it to the startup log. Store this key — it is required for gateway requests (Authorization: Bearer vcm-...) and will not be shown again. You can manage keys later in Settings → Gateway & Keys.
Next steps
Once a client node appears in the dashboard, you are ready to deploy models. See the Deployments page for details on vLLM version selection, engine options, GPU assignment, local checkpoints, and LoRA adapters. When a model is running, the Endpoint button gives you copy-paste connection snippets — see Gateway & Usage for the cluster-wide OpenAI-compatible API and token accounting.