vLLM Cluster Manager
Operate multi-node vLLM deployments
Spin up a host with a web dashboard, then add GPU nodes with the client agent. The UI lets you launch, monitor, and troubleshoot model deployments without building a full MLOps stack.
Best for
- Research labs
- Small teams
- Multi-model serving
Built-in
- Service discovery
- Web UI
- Systemd support
What you can do
- Register and manage GPU nodes that run vLLM workloads.
- Deploy models with a specific vLLM version, nightly build, or commit hash — each deployment gets its own isolated venv.
- Install extra pip packages and upload vLLM plugins (
.py,.whl) per deployment. - Select GPUs with toggle buttons and configure tensor parallelism.
- Save and reload deployment configurations for one-click redeployment.
- Monitor node health, GPU utilization, and deployment status in real time.
- Stream logs from running processes for quick troubleshooting.
- Automatic deployment recovery after backend restarts.
Supported platforms
- Python 3.10–3.14
- Ubuntu 22.04 and 24.04
- NVIDIA GPUs (H100, A100, L40, RTX 4090, DGX Spark)
Tip
New here? Start with the Getting Started guide, then review the Deployments page for version selection and configuration options.