Skip to content

vLLM Cluster Manager

Operate multi-node vLLM deployments

Spin up a host with a web dashboard, then add GPU nodes with the client agent. The UI lets you launch, monitor, and troubleshoot model deployments without building a full MLOps stack.

Best for
  • Research labs
  • Small teams
  • Multi-model serving
Built-in
  • Service discovery
  • Web UI
  • Systemd support
vLLM Cluster Manager dashboard

What you can do

  • Register and manage GPU nodes that run vLLM workloads.
  • Deploy models with a specific vLLM version, nightly build, or commit hash — each deployment gets its own isolated venv.
  • Install extra pip packages and upload vLLM plugins (.py, .whl) per deployment.
  • Select GPUs with toggle buttons and configure tensor parallelism.
  • Save and reload deployment configurations for one-click redeployment.
  • Monitor node health, GPU utilization, and deployment status in real time.
  • Stream logs from running processes for quick troubleshooting.
  • Automatic deployment recovery after backend restarts.

Supported platforms

  • Python 3.10–3.14
  • Ubuntu 22.04 and 24.04
  • NVIDIA GPUs (H100, A100, L40, RTX 4090, DGX Spark)

Tip

New here? Start with the Getting Started guide, then review the Deployments page for version selection and configuration options.