Configuration Reference
Complete reference for job configuration YAML files.
Table of Contents
Overview
Cluster Config Discovery
srtctl looks for srtslurm.yaml (cluster-wide settings) in this order:
SRTSLURM_CONFIGenvironment variable (if set) - explicit path to config fileCurrent working directory
Parent directory (1 level up)
Grandparent directory (2 levels up)
For users working in deep directory structures (e.g., study directories), set SRTSLURM_CONFIG in your shell profile:
This allows you to run srtctl apply -f config.yaml from anywhere without needing srtslurm.yaml nearby.
Cluster Config Fields
The srtslurm.yaml file can contain the following fields:
default_account
string
Default SLURM account
default_partition
string
Default SLURM partition
default_time_limit
string
Default job time limit
gpus_per_node
int
Default GPUs per node
network_interface
string
Network interface for NCCL
srtctl_root
string
Root directory for srtctl
output_dir
string
Custom output directory (overrides srtctl_root/outputs)
model_paths
dict
Model path aliases
containers
dict
Container image aliases
default_mounts
dict
Cluster-wide container mounts
output_dir: When set, job logs are written to output_dir/{job_id}/logs instead of srtctl_root/outputs/{job_id}/logs. Useful for CI/CD and ephemeral environments.
name
name
string
Yes
Job name, used for identification and log prefixes
model
Model and container configuration.
path
string
Yes
Model path alias (from srtslurm.yaml) or absolute path
container
string
Yes
Container alias (from srtslurm.yaml) or .sqsh path
precision
string
Yes
Model precision (informational: fp4, fp8, fp16, bf16)
resources
GPU allocation and worker topology.
Disaggregated Mode (prefill + decode)
Aggregated Mode (single worker type)
gpu_type
string
-
GPU type: "gb200", "gb300", or "h100"
gpus_per_node
int
4
GPUs per node
prefill_nodes
int
null
Nodes dedicated to prefill
decode_nodes
int
null
Nodes dedicated to decode
prefill_workers
int
null
Number of prefill workers
decode_workers
int
null
Number of decode workers
agg_nodes
int
null
Nodes for aggregated mode
agg_workers
int
null
Number of aggregated workers
gpus_per_prefill
int
computed
Explicit GPUs per prefill worker
gpus_per_decode
int
computed
Explicit GPUs per decode worker
gpus_per_agg
int
computed
Explicit GPUs per aggregated worker
Notes:
Set
decode_nodes: 0to have decode workers share nodes with prefill workers.Either use disaggregated mode (prefill_nodes/decode_nodes) OR aggregated mode (agg_nodes), not both.
GPUs per worker are computed automatically:
(nodes * gpus_per_node) / workersUse
gpus_per_prefill,gpus_per_decode,gpus_per_aggto explicitly override the computed values
Computed Properties
The ResourceConfig provides several computed properties:
is_disaggregated: True if using prefill/decode modetotal_nodes: Total nodes allocated (prefill + decode or agg)num_prefill,num_decode,num_agg: Worker counts for each rolegpus_per_prefill,gpus_per_decode,gpus_per_agg: GPUs allocated per workerprefill_gpus,decode_gpus: Total GPUs for each role
slurm
SLURM job settings.
time_limit
string
from srtslurm.yaml
Job time limit (HH:MM:SS)
account
string
from srtslurm.yaml
SLURM account
partition
string
from srtslurm.yaml
SLURM partition
frontend
Frontend/router configuration.
type
str
dynamo
Frontend type: "dynamo" or "sglang"
enable_multiple_frontends
bool
true
Scale with nginx + multiple routers
num_additional_frontends
int
9
Additional routers beyond master
nginx_container
str
nginx:1.27.4
Custom nginx container image
args
dict
null
CLI args for the frontend
env
dict
null
Env vars for frontend processes
See SGLang Router for detailed architecture.
backend
Worker configuration and SGLang settings.
type
string
sglang
Backend type: "sglang" or "trtllm"
gpu_type
string
null
GPU type override
prefill_environment
dict
{}
Environment variables for prefill
decode_environment
dict
{}
Environment variables for decode
aggregated_environment
dict
{}
Environment variables for aggregated
sglang_config
object
null
SGLang CLI configuration per mode
kv_events_config
bool/dict
null
KV events configuration
sglang_config
Per-mode SGLang server configuration. Any SGLang CLI flag can be specified (use kebab-case or snake_case):
tensor-parallel-size
int
Tensor parallelism degree
data-parallel-size
int
Data parallelism degree
expert-parallel-size
int
Expert parallelism (MoE models)
mem-fraction-static
float
GPU memory fraction (0.0-1.0)
kv-cache-dtype
string
KV cache precision (fp8_e4m3, etc.)
context-length
int
Max context length
chunked-prefill-size
int
Chunked prefill batch size
enable-dp-attention
bool
Enable DP attention
disaggregation-mode
string
"prefill" or "decode"
disaggregation-transfer-backend
string
Transfer backend ("nixl" or other)
served-model-name
string
Model name for API
grpc-mode
bool
Enable gRPC mode
kv_events_config
Note: KV events is a Dynamo frontend feature for kv-aware routing. It allows workers to publish cache/scheduling information over ZMQ for the Dynamo router to make intelligent routing decisions.
Enables --kv-events-config for workers with auto-allocated ZMQ ports.
Each worker leader gets a globally unique port starting at 5550:
prefill_0
5550
prefill_1
5551
decode_0
5552
decode_1
5553
TRTLLM Backend
When using type: trtllm, the backend uses TRTLLM with MPI-style launching:
type
string
-
Must be "trtllm"
prefill_environment
dict
{}
Environment variables for prefill
decode_environment
dict
{}
Environment variables for decode
trtllm_config
object
null
TRTLLM CLI configuration per mode
Key differences from SGLang backend:
No aggregated mode support (prefill/decode only)
Uses MPI-style launching (one srun per endpoint with all nodes)
Uses
trtllm-llmapi-launchfor distributed launchingAutomatically sets
TRTLLM_EPLB_SHM_NAMEwith unique UUID per endpoint
benchmark
Benchmark configuration. The type field determines which benchmark runner is used and what additional fields are available.
Available Benchmark Types
manual
No benchmark (default), manual testing mode
sa-bench
Throughput/latency serving benchmark
sglang-bench
SGLang bench_serving benchmark
mmlu
MMLU accuracy evaluation
gpqa
GPQA (Graduate-level science QA) evaluation
longbenchv2
Long-context evaluation benchmark
router
Router performance with prefix caching
mooncake-router
KV-aware routing with Mooncake trace
manual
No benchmark is run. Use for manual testing and debugging.
sa-bench (Serving Accuracy)
Throughput and latency benchmark at various concurrency levels.
isl
int
Yes
-
Input sequence length
osl
int
Yes
-
Output sequence length
concurrencies
list/string
Yes
-
Concurrency levels (list or "NxM" format)
req_rate
string/int
No
"inf"
Request rate
Concurrencies format: Can be a list [128, 256, 512] or x-separated string "128x256x512".
sglang-bench
SGLang bench_serving benchmark at various concurrency levels.
isl
int
Yes
-
Input sequence length
osl
int
Yes
-
Output sequence length
concurrencies
list/string
Yes
-
Concurrency levels (list or "NxM" format)
req_rate
string/int
No
"inf"
Request rate
Concurrencies format: Can be a list [128, 256, 512] or x-separated string "128x256x512".
mmlu
MMLU accuracy evaluation using sglang.test.run_eval.
num_examples
int
No
200
Number of examples to run
max_tokens
int
No
2048
Max tokens per response
repeat
int
No
8
Number of repeats
num_threads
int
No
512
Concurrent threads
gpqa
Graduate-level science QA evaluation using sglang.test.run_eval.
num_examples
int
No
198
Number of examples to run
max_tokens
int
No
32768
Max tokens per response
repeat
int
No
8
Number of repeats
num_threads
int
No
128
Concurrent threads
longbenchv2
Long-context evaluation benchmark.
max_context_length
int
No
128000
Max context length
num_threads
int
No
16
Concurrent threads
max_tokens
int
No
16384
Max tokens
num_examples
int
No
all
Number of examples
categories
list[str]
No
all
Task categories to run
router
Router performance benchmark with prefix caching. Requires frontend.type: sglang.
isl
int
No
14000
Input sequence length
osl
int
No
200
Output sequence length
num_requests
int
No
200
Number of requests
concurrency
int
No
20
Concurrency level
prefix_ratios
list/string
No
"0.1 0.3 0.5 0.7 0.9"
Prefix ratios to test
mooncake-router
KV-aware routing benchmark using Mooncake conversation trace.
mooncake_workload
string
No
"conversation"
Trace type (see options below)
ttft_threshold_ms
int
No
2000
Goodput TTFT threshold in ms
itl_threshold_ms
int
No
25
Goodput ITL threshold in ms
Workload options: "mooncake", "conversation", "synthetic", "toolagent"
Dataset characteristics (conversation trace):
12,031 requests over ~59 minutes (3.4 req/s)
Avg input: 12,035 tokens, Avg output: 343 tokens
36.64% cache efficiency potential
dynamo
Dynamo installation configuration.
install
bool
true
Whether to install dynamo (set false if pre-installed)
version
string
"0.8.0"
PyPI version
hash
string
null
Git commit hash (source install)
top_of_tree
bool
false
Install from main branch
Notes:
Set
install: falseif your container already has dynamo pre-installed.Only one of
version,hash, ortop_of_treeshould be specified.hashandtop_of_treeare mutually exclusive.When
hashortop_of_treeis set,versionis automatically cleared.Source installs (
hashortop_of_tree) clone the repo and build with maturin.
profiling
Profiling configuration for nsys or torch profiler.
type
string
No
"none"
Profiling type: "none", "nsys", "torch"
prefill
object
Disaggregated
null
Prefill phase config
decode
object
Disaggregated
null
Decode phase config
aggregated
object
Aggregated
null
Aggregated phase config
ProfilingPhaseConfig
Each phase config has:
start_step
int
No
null
Step to start profiling
stop_step
int
No
null
Step to stop profiling
Profiling Modes
nsys: NVIDIA Nsight Systems profiling. Wraps worker command with
nsys profile.torch: PyTorch profiler. Sets
SGLANG_TORCH_PROFILER_DIRenvironment variable.
Validation Rules
Disaggregated mode requires both
prefillanddecodephase configs when profiling is enabled.Aggregated mode requires
aggregatedphase config when profiling is enabled.
Example: Torch Profiling (Disaggregated)
Example: Nsys Profiling (Aggregated)
output
Output configuration with formattable paths.
log_dir
FormattablePath
"./outputs/{job_id}/logs"
Directory for log files
The log_dir supports FormattablePath templating. See FormattablePath Template System.
health_check
Health check configuration for worker readiness.
max_attempts
int
180
Maximum health check attempts (180 = 30 minutes)
interval_seconds
int
10
Seconds between health check attempts
Notes:
Default of 180 attempts at 10 second intervals = 30 minutes total wait time.
Large models (e.g., 70B+ parameters) may require the full 30 minutes to load.
Reduce
max_attemptsfor smaller models or faster testing.
infra
Infrastructure configuration for etcd/nats placement.
etcd_nats_dedicated_node
bool
false
Reserve first node for infrastructure services
Notes:
When
etcd_nats_dedicated_node: true, the first allocated node is reserved exclusively for etcd and nats services.This can improve stability for large-scale deployments by isolating infrastructure services.
The reserved node is not used for worker processes.
sweep
Parameter sweep configuration for running multiple benchmark variations.
mode
string
"zip"
Sweep mode: "zip" or "grid"
parameters
dict
{}
Parameter name to list of values mapping
Sweep Modes
zip: Pairs up parameters at matching indices. Parameters must have equal lengths.
Example:
isl=[512, 1024], osl=[128, 256]produces 2 combinations:{isl: 512, osl: 128}{isl: 1024, osl: 256}
grid: Cartesian product of all parameter values.
Example:
isl=[512, 1024], osl=[128, 256]produces 4 combinations:{isl: 512, osl: 128}{isl: 512, osl: 256}{isl: 1024, osl: 128}{isl: 1024, osl: 256}
Using Sweep Parameters
Reference sweep parameters in your config using {placeholder} syntax:
Config Overrides
Config overrides let you define a base config plus multiple variants in a single YAML file. Each variant deep-merges a small set of changes onto the base, and is submitted as an independent SLURM job. This eliminates the need to duplicate entire config files when testing different parameter combinations.
YAML Structure
base
Required. A complete, valid config (same structure as a normal recipe).
override_<suffix>
Optional. Partial config merged onto base. <suffix> is appended to the job name.
Naming
Override job names are auto-generated: {base.name}_{suffix}.
The example above produces three jobs: my-benchmark, my-benchmark_tp64, and my-benchmark_small.
Deep Merge Semantics
Scalar (str/int/bool)
Override replaces base
tp-size: 32 → tp-size: 64
Dict
Recursive merge — only specified keys change
Override sglang_config.decode.tp-size: 64 leaves other decode keys untouched
List
Full replacement (no append)
concurrencies: [4096] replaces [8192, 10240]
New key
Added to base
Override adds fields base doesn't have
null value
Deletes the key from base
extra_mount: null removes it
Combining with Sweeps
Overrides and sweeps can coexist in the same file. Override expansion happens first, then each variant with a sweep: section is expanded via Cartesian product.
This produces 4 jobs: base × 2 sweep + override_big × 2 sweep.
Backward Compatibility
Files without a base top-level key are treated as normal configs — no behavior change.
FormattablePath Template System
FormattablePath is a powerful templating system for paths that supports runtime placeholders and environment variable expansion.
How It Works
FormattablePath ensures that configuration values with placeholders are always explicitly formatted before use, preventing accidental use of unformatted templates.
Available Placeholders
{job_id}
string
SLURM job ID
"12345"
{run_name}
string
Job name + job ID
"my-benchmark_12345"
{head_node_ip}
string
IP address of head node
"10.0.0.1"
{log_dir}
string
Resolved log directory path
"/home/user/outputs/12345/logs"
{model_path}
string
Resolved model path
"/models/deepseek-r1"
{container_image}
string
Resolved container image path
"/containers/sglang.sqsh"
{gpus_per_node}
int
GPUs per node
8
Environment Variable Expansion
FormattablePath also expands environment variables using $VAR or ${VAR} syntax:
Common environment variables:
$HOME- User home directory$USER- Username$SLURM_JOB_ID- SLURM job ID (also available as{job_id})
Extra Placeholders
Some contexts support additional placeholders:
{nginx_url}
Frontend config
Nginx URL for load balancing
{frontend_url}
Frontend config
Frontend/router URL
{index}
Worker config
Worker index
{host}
Worker config
Worker host
{port}
Worker config
Worker port
Examples
container_mounts
Custom container mount mappings with FormattablePath support.
FormattablePath
FormattablePath
Host path -> Container mount path
Both keys and values support FormattablePath templating with placeholders and environment variables.
Default Mounts
The following mounts are always added automatically:
Model path
/model
Resolved model directory
Log directory
/logs
Log output directory
configs/ directory
/configs
NATS, etcd binaries
Benchmark scripts
/srtctl-benchmarks
Bundled benchmark scripts
Cluster-Level Mounts
You can also define cluster-wide mounts in srtslurm.yaml using the default_mounts field. These are applied to all jobs on the cluster, after the built-in defaults but before job-level mounts.
Environment variables (e.g., $SCRATCH, $HOME) are expanded. This is useful for mounting cluster-specific paths that are required by certain images without adding them to every job config.
Mount Priority
Mounts have the following priority (highest to lowest):
Job-level
container_mounts- FormattablePath dict (highest priority)Job-level
extra_mount- simplehost:containerstringsCluster-level -
default_mountsfromsrtslurm.yamlBuilt-in defaults - model, logs, configs, benchmark scripts (lowest priority)
Job-level mounts always take precedence over cluster-level and built-in defaults.
environment
Global environment variables for all worker processes.
string
string
Environment variable name=value
Per-Worker Template Variables
Environment variable values support per-worker templating with these placeholders:
{node}
Hostname of the node where the worker runs
"gpu-01"
{node_id}
Numeric index of the node in worker list (0-based)
0, 1, 2
Note: For per-worker-mode environment variables, use backend.prefill_environment, backend.decode_environment, or backend.aggregated_environment.
extra_mount
Additional container mounts as a list of mount specifications.
host_path:container_path
Read-write mount
host_path:container_path:ro
Read-only mount
Note: Unlike container_mounts, extra_mount uses simple string format, not FormattablePath. Environment variables are still expanded.
sbatch_directives
Additional SLURM sbatch directives.
mail-user
"user@example.com"
Email for notifications
mail-type
"END,FAIL"
When to send email (BEGIN,END,FAIL)
comment
"My job description"
Job comment for tracking
reservation
"my-reservation"
Use a specific reservation
constraint
"volta"
Node feature constraint
exclusive
""
Exclusive node access (flag)
gres
"gpu:8"
Generic resource specification
dependency
"afterok:12345"
Job dependency
qos
"high"
Quality of service
Format: Each directive becomes #SBATCH --{key}={value} or #SBATCH --{key} if value is empty.
srun_options
Additional srun options for worker processes.
cpu-bind
"none"
CPU binding mode (none, cores, sockets)
mpi
"pmix"
MPI implementation
overlap
""
Allow step overlap (flag)
ntasks-per-node
"1"
Tasks per node
gpus-per-task
"1"
GPUs per task
mem
"0"
Memory per node
Format: Each option becomes --{key}={value} or --{key} if value is empty.
setup_script
Run a custom script before dynamo install and worker startup.
setup_script
string
null
Script filename (must be in configs/)
Notes:
Script must be located in the
configs/directory.Script runs inside the container before dynamo installation.
Useful for installing custom SGLang versions, additional dependencies, or patches.
Example setup script (configs/install-sglang-main.sh):
enable_config_dump
Enable dumping worker configuration to JSON for debugging.
enable_config_dump
bool
true
Dump config JSON for debugging
When enabled, worker startup commands include --dump-config-to which writes the resolved configuration to a JSON file.
Complete Examples
Disaggregated Mode with Dynamo
Aggregated Mode with SGLang Router
Profiling Example
Parameter Sweep Example
Config Override Example
Custom Mounts and Setup
Last updated