Installation

Prerequisites

Access to a SLURM cluster with GPU nodes
Python 3.10+
Container runtime (enroot/pyxis) configured on the cluster
Model weights accessible from compute nodes
SGLang container image (.sqsh format)

Clone and Install

git clone https://github.com/ishandhanani/srt-slurm.git
cd srt-slurm
pip install -e .

Gather your cluster user and target partition

These commands might not work on all clusters. You can use AI to figure out the right set of commands for your cluster.

# user
sacctmgr -nP show assoc where user=$(whoami) format=account
# partition
sinfo

Run Setup

If you are trying to deploy onto Grace (GH200, GB200, etc.), you need to use the aarch64 architecture. Otherwise use x86_64.

make setup ARCH=aarch64  # or ARCH=x86_64

The setup will:

Download NATS/ETCD binaries for your architecture
Prompt you for cluster settings:
- SLURM account (default: restricted)
- SLURM partition (default: batch)
- GPUs per node (default: 4)
- Time limit (default: 4:00:00)
Create srtslurm.yaml with your settings
Auto-detect and set srtctl_root path

Configure srtslurm.yaml

After setup, edit srtslurm.yaml to add model paths, containers, and cluster-specific settings:

Adding Model Paths

The model_paths section maps short aliases to full filesystem paths:

model_paths:
  deepseek-r1: "/mnt/lustre/models/DeepSeek-R1"
  deepseek-r1-fp4: "/mnt/lustre/models/deepseek-r1-0528-fp4-v2"

Models must be accessible from all compute nodes (typically on a shared filesystem like Lustre or GPFS).

Adding Containers

The containers section maps version aliases to .sqsh container images:

containers:
  container1: "/mnt/containers/lmsysorg+sglang+v0.5.5.sqsh"
  container2: "/mnt/containers/lmsysorg+sglang+v0.5.4.sqsh"

To create a container image from Docker:

enroot import docker://lmsysorg/sglang:v0.5.5

Complete srtslurm.yaml Reference

Here's a complete example of all available options:

# Default SLURM settings
default_account: "your-account"
default_partition: "batch"
default_time_limit: "4:00:00"

# Resource defaults
gpus_per_node: 4

# SLURM directive compatibility
use_gpus_per_node_directive: true # Set false if cluster doesn't support --gpus-per-node
use_segment_sbatch_directive: true # Set false if cluster doesn't support --segment

# Path to srtctl repo root (auto-set by make setup)
srtctl_root: "/path/to/srtctl"

# Model path aliases
model_paths:
  deepseek-r1: "/models/DeepSeek-R1"
  llama-70b: "/models/Llama-3-70B"

# Container aliases
containers:
  latest: "/containers/sglang-latest.sqsh"
  stable: "/containers/sglang-stable.sqsh"

Create a Job Config

Create configs/my-job.yaml:

name: "my-benchmark"

model:
  path: "deepseek-r1" # Uses alias from srtslurm.yaml
  container: "latest" # Uses alias from srtslurm.yaml
  precision: "fp8"

extra_mount: # add this if you need to mount extra directories to the container
  - "/local-dir1:/container-dir1"
  - "/local-dir2:/container-dir2"

resources:
  gpu_type: "gb200"
  prefill_nodes: 1
  decode_nodes: 2
  prefill_workers: 1
  decode_workers: 1
  gpus_per_node: 4

slurm:
  time_limit: "02:00:00"

backend:
  prefill_environment:
    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
  decode_environment:
    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"

  sglang_config:
    prefill:
      kv-cache-dtype: "fp8_e4m3"
      mem-fraction-static: 0.84
      tensor-parallel-size: 4
    decode:
      kv-cache-dtype: "fp8_e4m3"
      mem-fraction-static: 0.83
      tensor-parallel-size: 8
      expert-parallel-size: 8
      data-parallel-size: 8
      enable-dp-attention: true

benchmark:
  type: "sa-bench"
  isl: 1024
  osl: 1024
  concurrencies: [256, 512]
  req_rate: "inf"

See Configuration Reference for all available options.

Submit the Job

srtctl apply -f configs/my-job.yaml

Output:

Submitted batch job 12345
Logs: logs/12345_1P_4D_20251122_143052/

Submit with Tags

You can tag runs for easier filtering in the dashboard:

srtctl apply -f configs/my-job.yaml --tags experiment,baseline,v2

Tags are saved in the job metadata and can be used to filter runs in analysis.

See Monitoring for how to monitor your job and understand the detailed log structure.

Custom Setup Scripts

You can run custom initialization scripts on worker nodes before starting SGLang workers. This is useful for:

Setting up custom environment variables
Installing additional dependencies
Checking out custom code

Creating a Setup Script

Create your setup script in the configs/ directory:

# configs/custom-setup.sh
# Example of checking out a specific branch of SGLang
#!/bin/bash
 cd /sgl-workspace/
 rm -rf sglang
 git clone https://github.com/sgl-project/sglang.git
 cd sglang
 git checkout origin/cheng/refactor/sbo
 git config --global --add safe.directory "*"
 pip install -e "python"

Make it executable:
```
chmod +x configs/custom-setup.sh
```

Submit with the --setup-script flag:

srtctl apply -f configs/my-job.yaml --setup-script custom-setup.sh

The script will be executed on each worker node (prefill, decode, or aggregated) before installing Dynamo from PyPI and starting the SGLang workers. The script must be located in the configs/ directory, which is mounted into containers at /configs/.

Note: Setup scripts only run when you explicitly specify --setup-script. No default setup script will run if this flag is omitted.

PreviousIntroduction NextCLI Reference

Last updated 2 days ago

Good morning