Configuration Reference

Complete reference for job configuration YAML files.

Table of Contents


Overview


Cluster Config Discovery

srtctl looks for srtslurm.yaml (cluster-wide settings) in this order:

  1. SRTSLURM_CONFIG environment variable (if set) - explicit path to config file

  2. Current working directory

  3. Parent directory (1 level up)

  4. Grandparent directory (2 levels up)

For users working in deep directory structures (e.g., study directories), set SRTSLURM_CONFIG in your shell profile:

This allows you to run srtctl apply -f config.yaml from anywhere without needing srtslurm.yaml nearby.

Cluster Config Fields

The srtslurm.yaml file can contain the following fields:

Field
Type
Description

default_account

string

Default SLURM account

default_partition

string

Default SLURM partition

default_time_limit

string

Default job time limit

gpus_per_node

int

Default GPUs per node

network_interface

string

Network interface for NCCL

srtctl_root

string

Root directory for srtctl

output_dir

string

Custom output directory (overrides srtctl_root/outputs)

model_paths

dict

Model path aliases

containers

dict

Container image aliases

default_mounts

dict

Cluster-wide container mounts

output_dir: When set, job logs are written to output_dir/{job_id}/logs instead of srtctl_root/outputs/{job_id}/logs. Useful for CI/CD and ephemeral environments.


name

Field
Type
Required
Description

name

string

Yes

Job name, used for identification and log prefixes


model

Model and container configuration.

Field
Type
Required
Description

path

string

Yes

Model path alias (from srtslurm.yaml) or absolute path

container

string

Yes

Container alias (from srtslurm.yaml) or .sqsh path

precision

string

Yes

Model precision (informational: fp4, fp8, fp16, bf16)


resources

GPU allocation and worker topology.

Disaggregated Mode (prefill + decode)

Aggregated Mode (single worker type)

Field
Type
Default
Description

gpu_type

string

-

GPU type: "gb200", "gb300", or "h100"

gpus_per_node

int

4

GPUs per node

prefill_nodes

int

null

Nodes dedicated to prefill

decode_nodes

int

null

Nodes dedicated to decode

prefill_workers

int

null

Number of prefill workers

decode_workers

int

null

Number of decode workers

agg_nodes

int

null

Nodes for aggregated mode

agg_workers

int

null

Number of aggregated workers

gpus_per_prefill

int

computed

Explicit GPUs per prefill worker

gpus_per_decode

int

computed

Explicit GPUs per decode worker

gpus_per_agg

int

computed

Explicit GPUs per aggregated worker

Notes:

  • Set decode_nodes: 0 to have decode workers share nodes with prefill workers.

  • Either use disaggregated mode (prefill_nodes/decode_nodes) OR aggregated mode (agg_nodes), not both.

  • GPUs per worker are computed automatically: (nodes * gpus_per_node) / workers

  • Use gpus_per_prefill, gpus_per_decode, gpus_per_agg to explicitly override the computed values

Computed Properties

The ResourceConfig provides several computed properties:

  • is_disaggregated: True if using prefill/decode mode

  • total_nodes: Total nodes allocated (prefill + decode or agg)

  • num_prefill, num_decode, num_agg: Worker counts for each role

  • gpus_per_prefill, gpus_per_decode, gpus_per_agg: GPUs allocated per worker

  • prefill_gpus, decode_gpus: Total GPUs for each role


slurm

SLURM job settings.

Field
Type
Default
Description

time_limit

string

from srtslurm.yaml

Job time limit (HH:MM:SS)

account

string

from srtslurm.yaml

SLURM account

partition

string

from srtslurm.yaml

SLURM partition


frontend

Frontend/router configuration.

Field
Type
Default
Description

type

str

dynamo

Frontend type: "dynamo" or "sglang"

enable_multiple_frontends

bool

true

Scale with nginx + multiple routers

num_additional_frontends

int

9

Additional routers beyond master

nginx_container

str

nginx:1.27.4

Custom nginx container image

args

dict

null

CLI args for the frontend

env

dict

null

Env vars for frontend processes

See SGLang Router for detailed architecture.


backend

Worker configuration and SGLang settings.

Field
Type
Default
Description

type

string

sglang

Backend type: "sglang" or "trtllm"

gpu_type

string

null

GPU type override

prefill_environment

dict

{}

Environment variables for prefill

decode_environment

dict

{}

Environment variables for decode

aggregated_environment

dict

{}

Environment variables for aggregated

sglang_config

object

null

SGLang CLI configuration per mode

kv_events_config

bool/dict

null

KV events configuration

sglang_config

Per-mode SGLang server configuration. Any SGLang CLI flag can be specified (use kebab-case or snake_case):

Common Flags
Type
Description

tensor-parallel-size

int

Tensor parallelism degree

data-parallel-size

int

Data parallelism degree

expert-parallel-size

int

Expert parallelism (MoE models)

mem-fraction-static

float

GPU memory fraction (0.0-1.0)

kv-cache-dtype

string

KV cache precision (fp8_e4m3, etc.)

context-length

int

Max context length

chunked-prefill-size

int

Chunked prefill batch size

enable-dp-attention

bool

Enable DP attention

disaggregation-mode

string

"prefill" or "decode"

disaggregation-transfer-backend

string

Transfer backend ("nixl" or other)

served-model-name

string

Model name for API

grpc-mode

bool

Enable gRPC mode

kv_events_config

Note: KV events is a Dynamo frontend feature for kv-aware routing. It allows workers to publish cache/scheduling information over ZMQ for the Dynamo router to make intelligent routing decisions.

Enables --kv-events-config for workers with auto-allocated ZMQ ports.

Each worker leader gets a globally unique port starting at 5550:

Worker
Port

prefill_0

5550

prefill_1

5551

decode_0

5552

decode_1

5553

TRTLLM Backend

When using type: trtllm, the backend uses TRTLLM with MPI-style launching:

Field
Type
Default
Description

type

string

-

Must be "trtllm"

prefill_environment

dict

{}

Environment variables for prefill

decode_environment

dict

{}

Environment variables for decode

trtllm_config

object

null

TRTLLM CLI configuration per mode

Key differences from SGLang backend:

  • No aggregated mode support (prefill/decode only)

  • Uses MPI-style launching (one srun per endpoint with all nodes)

  • Uses trtllm-llmapi-launch for distributed launching

  • Automatically sets TRTLLM_EPLB_SHM_NAME with unique UUID per endpoint


benchmark

Benchmark configuration. The type field determines which benchmark runner is used and what additional fields are available.

Available Benchmark Types

Type
Description

manual

No benchmark (default), manual testing mode

sa-bench

Throughput/latency serving benchmark

sglang-bench

SGLang bench_serving benchmark

mmlu

MMLU accuracy evaluation

gpqa

GPQA (Graduate-level science QA) evaluation

longbenchv2

Long-context evaluation benchmark

router

Router performance with prefix caching

mooncake-router

KV-aware routing with Mooncake trace

manual

No benchmark is run. Use for manual testing and debugging.

sa-bench (Serving Accuracy)

Throughput and latency benchmark at various concurrency levels.

Field
Type
Required
Default
Description

isl

int

Yes

-

Input sequence length

osl

int

Yes

-

Output sequence length

concurrencies

list/string

Yes

-

Concurrency levels (list or "NxM" format)

req_rate

string/int

No

"inf"

Request rate

Concurrencies format: Can be a list [128, 256, 512] or x-separated string "128x256x512".

sglang-bench

SGLang bench_serving benchmark at various concurrency levels.

Field
Type
Required
Default
Description

isl

int

Yes

-

Input sequence length

osl

int

Yes

-

Output sequence length

concurrencies

list/string

Yes

-

Concurrency levels (list or "NxM" format)

req_rate

string/int

No

"inf"

Request rate

Concurrencies format: Can be a list [128, 256, 512] or x-separated string "128x256x512".

mmlu

MMLU accuracy evaluation using sglang.test.run_eval.

Field
Type
Required
Default
Description

num_examples

int

No

200

Number of examples to run

max_tokens

int

No

2048

Max tokens per response

repeat

int

No

8

Number of repeats

num_threads

int

No

512

Concurrent threads

gpqa

Graduate-level science QA evaluation using sglang.test.run_eval.

Field
Type
Required
Default
Description

num_examples

int

No

198

Number of examples to run

max_tokens

int

No

32768

Max tokens per response

repeat

int

No

8

Number of repeats

num_threads

int

No

128

Concurrent threads

longbenchv2

Long-context evaluation benchmark.

Field
Type
Required
Default
Description

max_context_length

int

No

128000

Max context length

num_threads

int

No

16

Concurrent threads

max_tokens

int

No

16384

Max tokens

num_examples

int

No

all

Number of examples

categories

list[str]

No

all

Task categories to run

router

Router performance benchmark with prefix caching. Requires frontend.type: sglang.

Field
Type
Required
Default
Description

isl

int

No

14000

Input sequence length

osl

int

No

200

Output sequence length

num_requests

int

No

200

Number of requests

concurrency

int

No

20

Concurrency level

prefix_ratios

list/string

No

"0.1 0.3 0.5 0.7 0.9"

Prefix ratios to test

mooncake-router

KV-aware routing benchmark using Mooncake conversation trace.

Field
Type
Required
Default
Description

mooncake_workload

string

No

"conversation"

Trace type (see options below)

ttft_threshold_ms

int

No

2000

Goodput TTFT threshold in ms

itl_threshold_ms

int

No

25

Goodput ITL threshold in ms

Workload options: "mooncake", "conversation", "synthetic", "toolagent"

Dataset characteristics (conversation trace):

  • 12,031 requests over ~59 minutes (3.4 req/s)

  • Avg input: 12,035 tokens, Avg output: 343 tokens

  • 36.64% cache efficiency potential


dynamo

Dynamo installation configuration.

Field
Type
Default
Description

install

bool

true

Whether to install dynamo (set false if pre-installed)

version

string

"0.8.0"

PyPI version

hash

string

null

Git commit hash (source install)

top_of_tree

bool

false

Install from main branch

Notes:

  • Set install: false if your container already has dynamo pre-installed.

  • Only one of version, hash, or top_of_tree should be specified.

  • hash and top_of_tree are mutually exclusive.

  • When hash or top_of_tree is set, version is automatically cleared.

  • Source installs (hash or top_of_tree) clone the repo and build with maturin.


profiling

Profiling configuration for nsys or torch profiler.

Field
Type
Required
Default
Description

type

string

No

"none"

Profiling type: "none", "nsys", "torch"

prefill

object

Disaggregated

null

Prefill phase config

decode

object

Disaggregated

null

Decode phase config

aggregated

object

Aggregated

null

Aggregated phase config

ProfilingPhaseConfig

Each phase config has:

Field
Type
Required
Default
Description

start_step

int

No

null

Step to start profiling

stop_step

int

No

null

Step to stop profiling

Profiling Modes

  • nsys: NVIDIA Nsight Systems profiling. Wraps worker command with nsys profile.

  • torch: PyTorch profiler. Sets SGLANG_TORCH_PROFILER_DIR environment variable.

Validation Rules

  1. Disaggregated mode requires both prefill and decode phase configs when profiling is enabled.

  2. Aggregated mode requires aggregated phase config when profiling is enabled.

Example: Torch Profiling (Disaggregated)

Example: Nsys Profiling (Aggregated)


output

Output configuration with formattable paths.

Field
Type
Default
Description

log_dir

FormattablePath

"./outputs/{job_id}/logs"

Directory for log files

The log_dir supports FormattablePath templating. See FormattablePath Template System.


health_check

Health check configuration for worker readiness.

Field
Type
Default
Description

max_attempts

int

180

Maximum health check attempts (180 = 30 minutes)

interval_seconds

int

10

Seconds between health check attempts

Notes:

  • Default of 180 attempts at 10 second intervals = 30 minutes total wait time.

  • Large models (e.g., 70B+ parameters) may require the full 30 minutes to load.

  • Reduce max_attempts for smaller models or faster testing.


infra

Infrastructure configuration for etcd/nats placement.

Field
Type
Default
Description

etcd_nats_dedicated_node

bool

false

Reserve first node for infrastructure services

Notes:

  • When etcd_nats_dedicated_node: true, the first allocated node is reserved exclusively for etcd and nats services.

  • This can improve stability for large-scale deployments by isolating infrastructure services.

  • The reserved node is not used for worker processes.


sweep

Parameter sweep configuration for running multiple benchmark variations.

Field
Type
Default
Description

mode

string

"zip"

Sweep mode: "zip" or "grid"

parameters

dict

{}

Parameter name to list of values mapping

Sweep Modes

  • zip: Pairs up parameters at matching indices. Parameters must have equal lengths.

    • Example: isl=[512, 1024], osl=[128, 256] produces 2 combinations:

      • {isl: 512, osl: 128}

      • {isl: 1024, osl: 256}

  • grid: Cartesian product of all parameter values.

    • Example: isl=[512, 1024], osl=[128, 256] produces 4 combinations:

      • {isl: 512, osl: 128}

      • {isl: 512, osl: 256}

      • {isl: 1024, osl: 128}

      • {isl: 1024, osl: 256}

Using Sweep Parameters

Reference sweep parameters in your config using {placeholder} syntax:


Config Overrides

Config overrides let you define a base config plus multiple variants in a single YAML file. Each variant deep-merges a small set of changes onto the base, and is submitted as an independent SLURM job. This eliminates the need to duplicate entire config files when testing different parameter combinations.

YAML Structure

Key
Description

base

Required. A complete, valid config (same structure as a normal recipe).

override_<suffix>

Optional. Partial config merged onto base. <suffix> is appended to the job name.

Naming

Override job names are auto-generated: {base.name}_{suffix}.

The example above produces three jobs: my-benchmark, my-benchmark_tp64, and my-benchmark_small.

Deep Merge Semantics

Type
Behavior
Example

Scalar (str/int/bool)

Override replaces base

tp-size: 32tp-size: 64

Dict

Recursive merge — only specified keys change

Override sglang_config.decode.tp-size: 64 leaves other decode keys untouched

List

Full replacement (no append)

concurrencies: [4096] replaces [8192, 10240]

New key

Added to base

Override adds fields base doesn't have

null value

Deletes the key from base

extra_mount: null removes it

Combining with Sweeps

Overrides and sweeps can coexist in the same file. Override expansion happens first, then each variant with a sweep: section is expanded via Cartesian product.

This produces 4 jobs: base × 2 sweep + override_big × 2 sweep.

Backward Compatibility

Files without a base top-level key are treated as normal configs — no behavior change.


FormattablePath Template System

FormattablePath is a powerful templating system for paths that supports runtime placeholders and environment variable expansion.

How It Works

FormattablePath ensures that configuration values with placeholders are always explicitly formatted before use, preventing accidental use of unformatted templates.

Available Placeholders

Placeholder
Type
Description
Example

{job_id}

string

SLURM job ID

"12345"

{run_name}

string

Job name + job ID

"my-benchmark_12345"

{head_node_ip}

string

IP address of head node

"10.0.0.1"

{log_dir}

string

Resolved log directory path

"/home/user/outputs/12345/logs"

{model_path}

string

Resolved model path

"/models/deepseek-r1"

{container_image}

string

Resolved container image path

"/containers/sglang.sqsh"

{gpus_per_node}

int

GPUs per node

8

Environment Variable Expansion

FormattablePath also expands environment variables using $VAR or ${VAR} syntax:

Common environment variables:

  • $HOME - User home directory

  • $USER - Username

  • $SLURM_JOB_ID - SLURM job ID (also available as {job_id})

Extra Placeholders

Some contexts support additional placeholders:

Placeholder
Context
Description

{nginx_url}

Frontend config

Nginx URL for load balancing

{frontend_url}

Frontend config

Frontend/router URL

{index}

Worker config

Worker index

{host}

Worker config

Worker host

{port}

Worker config

Worker port

Examples


container_mounts

Custom container mount mappings with FormattablePath support.

Key (Host Path)
Value (Container Path)
Description

FormattablePath

FormattablePath

Host path -> Container mount path

Both keys and values support FormattablePath templating with placeholders and environment variables.

Default Mounts

The following mounts are always added automatically:

Host Path
Container Path
Description

Model path

/model

Resolved model directory

Log directory

/logs

Log output directory

configs/ directory

/configs

NATS, etcd binaries

Benchmark scripts

/srtctl-benchmarks

Bundled benchmark scripts

Cluster-Level Mounts

You can also define cluster-wide mounts in srtslurm.yaml using the default_mounts field. These are applied to all jobs on the cluster, after the built-in defaults but before job-level mounts.

Environment variables (e.g., $SCRATCH, $HOME) are expanded. This is useful for mounting cluster-specific paths that are required by certain images without adding them to every job config.

Mount Priority

Mounts have the following priority (highest to lowest):

  1. Job-level container_mounts - FormattablePath dict (highest priority)

  2. Job-level extra_mount - simple host:container strings

  3. Cluster-level - default_mounts from srtslurm.yaml

  4. Built-in defaults - model, logs, configs, benchmark scripts (lowest priority)

Job-level mounts always take precedence over cluster-level and built-in defaults.


environment

Global environment variables for all worker processes.

Key
Value
Description

string

string

Environment variable name=value

Per-Worker Template Variables

Environment variable values support per-worker templating with these placeholders:

Placeholder
Description
Example

{node}

Hostname of the node where the worker runs

"gpu-01"

{node_id}

Numeric index of the node in worker list (0-based)

0, 1, 2

Note: For per-worker-mode environment variables, use backend.prefill_environment, backend.decode_environment, or backend.aggregated_environment.


extra_mount

Additional container mounts as a list of mount specifications.

Format
Description

host_path:container_path

Read-write mount

host_path:container_path:ro

Read-only mount

Note: Unlike container_mounts, extra_mount uses simple string format, not FormattablePath. Environment variables are still expanded.


sbatch_directives

Additional SLURM sbatch directives.

Directive
Example Value
Description

mail-user

"user@example.com"

Email for notifications

mail-type

"END,FAIL"

When to send email (BEGIN,END,FAIL)

comment

"My job description"

Job comment for tracking

reservation

"my-reservation"

Use a specific reservation

constraint

"volta"

Node feature constraint

exclusive

""

Exclusive node access (flag)

gres

"gpu:8"

Generic resource specification

dependency

"afterok:12345"

Job dependency

qos

"high"

Quality of service

Format: Each directive becomes #SBATCH --{key}={value} or #SBATCH --{key} if value is empty.


srun_options

Additional srun options for worker processes.

Option
Example Value
Description

cpu-bind

"none"

CPU binding mode (none, cores, sockets)

mpi

"pmix"

MPI implementation

overlap

""

Allow step overlap (flag)

ntasks-per-node

"1"

Tasks per node

gpus-per-task

"1"

GPUs per task

mem

"0"

Memory per node

Format: Each option becomes --{key}={value} or --{key} if value is empty.


setup_script

Run a custom script before dynamo install and worker startup.

Field
Type
Default
Description

setup_script

string

null

Script filename (must be in configs/)

Notes:

  • Script must be located in the configs/ directory.

  • Script runs inside the container before dynamo installation.

  • Useful for installing custom SGLang versions, additional dependencies, or patches.

Example setup script (configs/install-sglang-main.sh):


enable_config_dump

Enable dumping worker configuration to JSON for debugging.

Field
Type
Default
Description

enable_config_dump

bool

true

Dump config JSON for debugging

When enabled, worker startup commands include --dump-config-to which writes the resolved configuration to a JSON file.


Complete Examples

Disaggregated Mode with Dynamo

Aggregated Mode with SGLang Router

Profiling Example

Parameter Sweep Example

Config Override Example

Custom Mounts and Setup

Last updated