Architecture

Version: 1.0 Last Updated: 2025-12-30


Table of Contents


High-Level Overview

What is srtctl?

srtctl (SLURM Runtime Control) is a Python-first orchestration framework for LLM inference benchmarks on SLURM clusters. It provides:

  • Configuration-driven deployment: YAML configs define model, resources, backends, and benchmarks

  • Multi-backend support: Currently SGLang with prefill/decode disaggregation

  • Automated orchestration: Handles infrastructure setup, worker spawning, health checks, and benchmarking

  • Container-based execution: Workers run inside containers with proper mounts and environment

Problem Statement

Running distributed LLM inference workloads on SLURM clusters involves significant complexity:

  1. Resource Allocation: Mapping GPU workers to nodes with proper tensor parallelism

  2. Process Coordination: Starting services in the correct order with health checks

  3. Configuration Management: Handling model paths, container images, and environment variables

  4. Monitoring & Cleanup: Tracking process health and graceful shutdown

srtctl abstracts this complexity into a simple YAML interface while providing extensibility for different backends, frontends, and benchmarks.

Architecture Overview


Design Philosophy

1. Single Source of Truth

The RuntimeContext computes all paths and values once at startup. This eliminates:

  • Scattered bash variable expansion

  • Inconsistent path computation

  • Configuration drift during execution

2. Frozen Dataclasses

All configuration objects are immutable after creation using @dataclass(frozen=True):

Benefits:

  • Prevents accidental mutation

  • Easier to reason about state

  • Safe to pass around without defensive copying

  • Thread-safe by default

3. Protocol Pattern (Duck Typing)

Using typing.Protocol instead of ABC for interfaces enables duck typing without inheritance:

4. Registry Pattern

Extensible component registration via decorators:

5. Factory Classmethods

Use @classmethod named from_* for construction:


System Components

CLI Layer

submit.py - Job Submission

Entry point for srtctl apply|dry-run -f config.yaml:

do_sweep.py - SweepOrchestrator

The main orchestration class that runs inside the SLURM job:

Configuration Layer

schema.py - Configuration Dataclasses

All configs are frozen dataclasses with marshmallow validation:

Class
Purpose
Key Fields

SrtConfig

Main job config

name, model, resources, backend, frontend, benchmark

ModelConfig

Model settings

path, container, precision

ResourceConfig

GPU/node allocation

gpu_type, gpus_per_node, prefill/decode nodes/workers

BackendConfig

Polymorphic backend

type, sglang_config, environment per mode

FrontendConfig

Router settings

type, enable_multiple_frontends, args, env

BenchmarkConfig

Benchmark params

type, isl, osl, concurrencies, sweep

ProfilingConfig

Profiling settings

type (nsys/torch), phase configs

runtime.py - RuntimeContext

The single source of truth for all runtime values:

Backend Layer

BackendProtocol

SGLangProtocol

Implements BackendProtocol for SGLang with P/D disaggregation:

Frontend Layer

FrontendProtocol

Infrastructure Layer


Architecture Layers

Layer Diagram


Data Flow Diagrams

Config Loading Flow

Job Submission Flow

Worker Startup Flow

Health Check Flow


Process Architecture on SLURM Cluster

Physical Layout

Port Allocation Strategy

Process Relationships


Key Abstractions

RuntimeContext

The single source of truth for all runtime values. Created once at job start:

Endpoint vs Process

NodePortAllocator

Manages per-node port assignments to avoid conflicts:

ProcessRegistry

Lifecycle management for all spawned processes:

ManagedProcess


Extension Points

How to Add a New Backend

  1. Create backend module at backends/mybackend.py:

  1. Register in backends/__init__.py:

  1. Update BackendConfigField in schema.py to handle polymorphic deserialization.

How to Add a New Frontend

  1. Create frontend module at frontends/myfrontend.py:

  1. Register in frontends/base.py:

How to Add a New Benchmark

  1. Create benchmark module at benchmarks/mybench.py:

  1. Add benchmark script at benchmarks/scripts/mybench/run.sh

  2. Import in benchmarks/__init__.py to trigger registration:


Module Dependencies

Import Hierarchy

Circular Import Prevention

  1. TYPE_CHECKING guard - Import type-only dependencies:

  1. Lazy imports - Import at function call time:

  1. Forward references - Use string annotations:


Directory Structure


Summary

srtctl is a well-architected orchestration framework with:

  • Clean separation of concerns: Config, runtime, backend, frontend, benchmark layers

  • Strong typing: Frozen dataclasses with marshmallow validation

  • Extensibility: Protocol-based backends/frontends, decorator-based benchmark registration

  • Robust process management: Registry, monitoring, graceful cleanup

  • SLURM integration: Proper container mounts, srun launching, nodelist parsing

  • Modern Python: 3.10+ syntax, comprehensive type hints, clear module structure

The codebase follows Python best practices and provides a solid foundation for orchestrating complex LLM inference workloads on SLURM clusters.

Last updated