Configuration Reference

Complete reference for job configuration YAML files.

Table of Contents


Overview


name

Field
Type
Required
Description

name

string

Yes

Job name, used for identification and log prefixes


model

Model and container configuration.

Field
Type
Required
Description

path

string

Yes

Model path alias (from srtslurm.yaml) or absolute path

container

string

Yes

Container alias (from srtslurm.yaml) or .sqsh path

precision

string

Yes

Model precision (informational: fp4, fp8, fp16, bf16)


resources

GPU allocation and worker topology.

Disaggregated Mode (prefill + decode)

Aggregated Mode (single worker type)

Field
Type
Default
Description

gpu_type

string

-

GPU type: "gb200", "gb300", or "h100"

gpus_per_node

int

4

GPUs per node

prefill_nodes

int

null

Nodes dedicated to prefill

decode_nodes

int

null

Nodes dedicated to decode

prefill_workers

int

null

Number of prefill workers

decode_workers

int

null

Number of decode workers

agg_nodes

int

null

Nodes for aggregated mode

agg_workers

int

null

Number of aggregated workers

Notes:

  • Set decode_nodes: 0 to have decode workers share nodes with prefill workers.

  • Either use disaggregated mode (prefill_nodes/decode_nodes) OR aggregated mode (agg_nodes), not both.

  • GPUs per worker are computed automatically: (nodes * gpus_per_node) / workers

Computed Properties

The ResourceConfig provides several computed properties:

  • is_disaggregated: True if using prefill/decode mode

  • total_nodes: Total nodes allocated (prefill + decode or agg)

  • num_prefill, num_decode, num_agg: Worker counts for each role

  • gpus_per_prefill, gpus_per_decode, gpus_per_agg: GPUs allocated per worker

  • prefill_gpus, decode_gpus: Total GPUs for each role


slurm

SLURM job settings.

Field
Type
Default
Description

time_limit

string

from srtslurm.yaml

Job time limit (HH:MM:SS)

account

string

from srtslurm.yaml

SLURM account

partition

string

from srtslurm.yaml

SLURM partition


frontend

Frontend/router configuration.

Field
Type
Default
Description

type

str

dynamo

Frontend type: "dynamo" or "sglang"

enable_multiple_frontends

bool

true

Scale with nginx + multiple routers

num_additional_frontends

int

9

Additional routers beyond master

args

dict

null

CLI args for the frontend

env

dict

null

Env vars for frontend processes

See SGLang Router for detailed architecture.


backend

Worker configuration and SGLang settings.

Field
Type
Default
Description

type

string

sglang

Backend type (currently only "sglang")

gpu_type

string

null

GPU type override

prefill_environment

dict

{}

Environment variables for prefill

decode_environment

dict

{}

Environment variables for decode

aggregated_environment

dict

{}

Environment variables for aggregated

sglang_config

object

null

SGLang CLI configuration per mode

kv_events_config

bool/dict

null

KV events configuration

sglang_config

Per-mode SGLang server configuration. Any SGLang CLI flag can be specified (use kebab-case or snake_case):

Common Flags
Type
Description

tensor-parallel-size

int

Tensor parallelism degree

data-parallel-size

int

Data parallelism degree

expert-parallel-size

int

Expert parallelism (MoE models)

mem-fraction-static

float

GPU memory fraction (0.0-1.0)

kv-cache-dtype

string

KV cache precision (fp8_e4m3, etc.)

context-length

int

Max context length

chunked-prefill-size

int

Chunked prefill batch size

enable-dp-attention

bool

Enable DP attention

disaggregation-mode

string

"prefill" or "decode"

disaggregation-transfer-backend

string

Transfer backend ("nixl" or other)

served-model-name

string

Model name for API

grpc-mode

bool

Enable gRPC mode

kv_events_config

Note: KV events is a Dynamo frontend feature for kv-aware routing. It allows workers to publish cache/scheduling information over ZMQ for the Dynamo router to make intelligent routing decisions.

Enables --kv-events-config for workers with auto-allocated ZMQ ports.

Each worker leader gets a globally unique port starting at 5550:

Worker
Port

prefill_0

5550

prefill_1

5551

decode_0

5552

decode_1

5553


benchmark

Benchmark configuration. The type field determines which benchmark runner is used and what additional fields are available.

Available Benchmark Types

Type
Description

manual

No benchmark (default), manual testing mode

sa-bench

Throughput/latency serving benchmark

mmlu

MMLU accuracy evaluation

gpqa

GPQA (Graduate-level science QA) evaluation

longbenchv2

Long-context evaluation benchmark

router

Router performance with prefix caching

mooncake-router

KV-aware routing with Mooncake trace

profiling

Profiling benchmark (auto-selected)

manual

No benchmark is run. Use for manual testing and debugging.

sa-bench (Serving Accuracy)

Throughput and latency benchmark at various concurrency levels.

Field
Type
Required
Default
Description

isl

int

Yes

-

Input sequence length

osl

int

Yes

-

Output sequence length

concurrencies

list/string

Yes

-

Concurrency levels (list or "NxM" format)

req_rate

string/int

No

"inf"

Request rate

Concurrencies format: Can be a list [128, 256, 512] or x-separated string "128x256x512".

mmlu

MMLU accuracy evaluation using sglang.test.run_eval.

Field
Type
Required
Default
Description

num_examples

int

No

200

Number of examples to run

max_tokens

int

No

2048

Max tokens per response

repeat

int

No

8

Number of repeats

num_threads

int

No

512

Concurrent threads

gpqa

Graduate-level science QA evaluation using sglang.test.run_eval.

Field
Type
Required
Default
Description

num_examples

int

No

198

Number of examples to run

max_tokens

int

No

32768

Max tokens per response

repeat

int

No

8

Number of repeats

num_threads

int

No

128

Concurrent threads

longbenchv2

Long-context evaluation benchmark.

Field
Type
Required
Default
Description

max_context_length

int

No

128000

Max context length

num_threads

int

No

16

Concurrent threads

max_tokens

int

No

16384

Max tokens

num_examples

int

No

all

Number of examples

categories

list[str]

No

all

Task categories to run

router

Router performance benchmark with prefix caching. Requires frontend.type: sglang.

Field
Type
Required
Default
Description

isl

int

No

14000

Input sequence length

osl

int

No

200

Output sequence length

num_requests

int

No

200

Number of requests

concurrency

int

No

20

Concurrency level

prefix_ratios

list/string

No

"0.1 0.3 0.5 0.7 0.9"

Prefix ratios to test

mooncake-router

KV-aware routing benchmark using Mooncake conversation trace.

Field
Type
Required
Default
Description

mooncake_workload

string

No

"conversation"

Trace type (see options below)

ttft_threshold_ms

int

No

2000

Goodput TTFT threshold in ms

itl_threshold_ms

int

No

25

Goodput ITL threshold in ms

Workload options: "mooncake", "conversation", "synthetic", "toolagent"

Dataset characteristics (conversation trace):

  • 12,031 requests over ~59 minutes (3.4 req/s)

  • Avg input: 12,035 tokens, Avg output: 343 tokens

  • 36.64% cache efficiency potential

profiling

Auto-selected when profiling.type is "torch" or "nsys". Configuration is in the profiling section, not here.


dynamo

Dynamo installation configuration.

Field
Type
Default
Description

version

string

"0.7.0"

PyPI version

hash

string

null

Git commit hash (source install)

top_of_tree

bool

false

Install from main branch

Notes:

  • Only one of version, hash, or top_of_tree should be specified.

  • hash and top_of_tree are mutually exclusive.

  • When hash or top_of_tree is set, version is automatically cleared.

  • Source installs (hash or top_of_tree) clone the repo and build with maturin.


profiling

Profiling configuration for nsys or torch profiler.

Field
Type
Required
Default
Description

type

string

No

"none"

Profiling type: "none", "nsys", "torch"

isl

int

When enabled

null

Input sequence length for profiling

osl

int

When enabled

null

Output sequence length for profiling

concurrency

int

When enabled

null

Batch size / concurrency

prefill

object

Disaggregated

null

Prefill phase config

decode

object

Disaggregated

null

Decode phase config

aggregated

object

Aggregated

null

Aggregated phase config

ProfilingPhaseConfig

Each phase config has:

Field
Type
Required
Default
Description

start_step

int

No

null

Step to start profiling

stop_step

int

No

null

Step to stop profiling

Profiling Modes

  • nsys: NVIDIA Nsight Systems profiling. Wraps worker command with nsys profile.

  • torch: PyTorch profiler. Sets SGLANG_TORCH_PROFILER_DIR environment variable.

Validation Rules

  1. When profiling is enabled (type != "none"), isl, osl, and concurrency are required.

  2. Disaggregated mode requires both prefill and decode phase configs.

  3. Aggregated mode requires aggregated phase config.

  4. Profiling mode requires exactly 1 worker per role (1 prefill + 1 decode, or 1 aggregated).

Example: Torch Profiling (Disaggregated)

Example: Nsys Profiling (Aggregated)


output

Output configuration with formattable paths.

Field
Type
Default
Description

log_dir

FormattablePath

"./outputs/{job_id}/logs"

Directory for log files

The log_dir supports FormattablePath templating. See FormattablePath Template System.


health_check

Health check configuration for worker readiness.

Field
Type
Default
Description

max_attempts

int

180

Maximum health check attempts (180 = 30 minutes)

interval_seconds

int

10

Seconds between health check attempts

Notes:

  • Default of 180 attempts at 10 second intervals = 30 minutes total wait time.

  • Large models (e.g., 70B+ parameters) may require the full 30 minutes to load.

  • Reduce max_attempts for smaller models or faster testing.


sweep

Parameter sweep configuration for running multiple benchmark variations.

Field
Type
Default
Description

mode

string

"zip"

Sweep mode: "zip" or "grid"

parameters

dict

{}

Parameter name to list of values mapping

Sweep Modes

  • zip: Pairs up parameters at matching indices. Parameters must have equal lengths.

    • Example: isl=[512, 1024], osl=[128, 256] produces 2 combinations:

      • {isl: 512, osl: 128}

      • {isl: 1024, osl: 256}

  • grid: Cartesian product of all parameter values.

    • Example: isl=[512, 1024], osl=[128, 256] produces 4 combinations:

      • {isl: 512, osl: 128}

      • {isl: 512, osl: 256}

      • {isl: 1024, osl: 128}

      • {isl: 1024, osl: 256}

Using Sweep Parameters

Reference sweep parameters in your config using {placeholder} syntax:


FormattablePath Template System

FormattablePath is a powerful templating system for paths that supports runtime placeholders and environment variable expansion.

How It Works

FormattablePath ensures that configuration values with placeholders are always explicitly formatted before use, preventing accidental use of unformatted templates.

Available Placeholders

Placeholder
Type
Description
Example

{job_id}

string

SLURM job ID

"12345"

{run_name}

string

Job name + job ID

"my-benchmark_12345"

{head_node_ip}

string

IP address of head node

"10.0.0.1"

{log_dir}

string

Resolved log directory path

"/home/user/outputs/12345/logs"

{model_path}

string

Resolved model path

"/models/deepseek-r1"

{container_image}

string

Resolved container image path

"/containers/sglang.sqsh"

{gpus_per_node}

int

GPUs per node

8

Environment Variable Expansion

FormattablePath also expands environment variables using $VAR or ${VAR} syntax:

Common environment variables:

  • $HOME - User home directory

  • $USER - Username

  • $SLURM_JOB_ID - SLURM job ID (also available as {job_id})

Extra Placeholders

Some contexts support additional placeholders:

Placeholder
Context
Description

{nginx_url}

Frontend config

Nginx URL for load balancing

{frontend_url}

Frontend config

Frontend/router URL

{index}

Worker config

Worker index

{host}

Worker config

Worker host

{port}

Worker config

Worker port

Examples


container_mounts

Custom container mount mappings with FormattablePath support.

Key (Host Path)
Value (Container Path)
Description

FormattablePath

FormattablePath

Host path -> Container mount path

Both keys and values support FormattablePath templating with placeholders and environment variables.

Default Mounts

The following mounts are always added automatically:

Host Path
Container Path
Description

Model path

/model

Resolved model directory

Log directory

/logs

Log output directory

configs/ directory

/configs

NATS, etcd binaries

Benchmark scripts

/srtctl-benchmarks

Bundled benchmark scripts


environment

Global environment variables for all worker processes.

Key
Value
Description

string

string

Environment variable name=value

Note: For per-worker-mode environment variables, use backend.prefill_environment, backend.decode_environment, or backend.aggregated_environment.


extra_mount

Additional container mounts as a list of mount specifications.

Format
Description

host_path:container_path

Read-write mount

host_path:container_path:ro

Read-only mount

Note: Unlike container_mounts, extra_mount uses simple string format, not FormattablePath. Environment variables are still expanded.


sbatch_directives

Additional SLURM sbatch directives.

Directive
Example Value
Description

mail-user

"user@example.com"

Email for notifications

mail-type

"END,FAIL"

When to send email (BEGIN,END,FAIL)

comment

"My job description"

Job comment for tracking

reservation

"my-reservation"

Use a specific reservation

constraint

"volta"

Node feature constraint

exclusive

""

Exclusive node access (flag)

gres

"gpu:8"

Generic resource specification

dependency

"afterok:12345"

Job dependency

qos

"high"

Quality of service

Format: Each directive becomes #SBATCH --{key}={value} or #SBATCH --{key} if value is empty.


srun_options

Additional srun options for worker processes.

Option
Example Value
Description

cpu-bind

"none"

CPU binding mode (none, cores, sockets)

mpi

"pmix"

MPI implementation

overlap

""

Allow step overlap (flag)

ntasks-per-node

"1"

Tasks per node

gpus-per-task

"1"

GPUs per task

mem

"0"

Memory per node

Format: Each option becomes --{key}={value} or --{key} if value is empty.


setup_script

Run a custom script before dynamo install and worker startup.

Field
Type
Default
Description

setup_script

string

null

Script filename (must be in configs/)

Notes:

  • Script must be located in the configs/ directory.

  • Script runs inside the container before dynamo installation.

  • Useful for installing custom SGLang versions, additional dependencies, or patches.

Example setup script (configs/install-sglang-main.sh):


enable_config_dump

Enable dumping worker configuration to JSON for debugging.

Field
Type
Default
Description

enable_config_dump

bool

true

Dump config JSON for debugging

When enabled, worker startup commands include --dump-config-to which writes the resolved configuration to a JSON file.


Complete Examples

Disaggregated Mode with Dynamo

Aggregated Mode with SGLang Router

Profiling Example

Parameter Sweep Example

Custom Mounts and Setup

Last updated