Architecture
Version: 1.0 Last Updated: 2025-12-30
Table of Contents
High-Level Overview
What is srtctl?
srtctl (SLURM Runtime Control) is a Python-first orchestration framework for LLM inference benchmarks on SLURM clusters. It provides:
Configuration-driven deployment: YAML configs define model, resources, backends, and benchmarks
Multi-backend support: Currently SGLang with prefill/decode disaggregation
Automated orchestration: Handles infrastructure setup, worker spawning, health checks, and benchmarking
Container-based execution: Workers run inside containers with proper mounts and environment
Problem Statement
Running distributed LLM inference workloads on SLURM clusters involves significant complexity:
Resource Allocation: Mapping GPU workers to nodes with proper tensor parallelism
Process Coordination: Starting services in the correct order with health checks
Configuration Management: Handling model paths, container images, and environment variables
Monitoring & Cleanup: Tracking process health and graceful shutdown
srtctl abstracts this complexity into a simple YAML interface while providing extensibility for different backends, frontends, and benchmarks.
Architecture Overview
Design Philosophy
1. Single Source of Truth
The RuntimeContext computes all paths and values once at startup. This eliminates:
Scattered bash variable expansion
Inconsistent path computation
Configuration drift during execution
2. Frozen Dataclasses
All configuration objects are immutable after creation using @dataclass(frozen=True):
Benefits:
Prevents accidental mutation
Easier to reason about state
Safe to pass around without defensive copying
Thread-safe by default
3. Protocol Pattern (Duck Typing)
Using typing.Protocol instead of ABC for interfaces enables duck typing without inheritance:
4. Registry Pattern
Extensible component registration via decorators:
5. Factory Classmethods
Use @classmethod named from_* for construction:
System Components
CLI Layer
submit.py - Job Submission
Entry point for srtctl apply|dry-run -f config.yaml:
do_sweep.py - SweepOrchestrator
The main orchestration class that runs inside the SLURM job:
Configuration Layer
schema.py - Configuration Dataclasses
All configs are frozen dataclasses with marshmallow validation:
SrtConfig
Main job config
name, model, resources, backend, frontend, benchmark
ModelConfig
Model settings
path, container, precision
ResourceConfig
GPU/node allocation
gpu_type, gpus_per_node, prefill/decode nodes/workers
BackendConfig
Polymorphic backend
type, sglang_config, environment per mode
FrontendConfig
Router settings
type, enable_multiple_frontends, args, env
BenchmarkConfig
Benchmark params
type, isl, osl, concurrencies, sweep
ProfilingConfig
Profiling settings
type (nsys/torch), phase configs
runtime.py - RuntimeContext
The single source of truth for all runtime values:
Backend Layer
BackendProtocol
SGLangProtocol
Implements BackendProtocol for SGLang with P/D disaggregation:
Frontend Layer
FrontendProtocol
Infrastructure Layer
Architecture Layers
Layer Diagram
Data Flow Diagrams
Config Loading Flow
Job Submission Flow
Worker Startup Flow
Health Check Flow
Process Architecture on SLURM Cluster
Physical Layout
Port Allocation Strategy
Process Relationships
Key Abstractions
RuntimeContext
The single source of truth for all runtime values. Created once at job start:
Endpoint vs Process
NodePortAllocator
Manages per-node port assignments to avoid conflicts:
ProcessRegistry
Lifecycle management for all spawned processes:
ManagedProcess
Extension Points
How to Add a New Backend
Create backend module at
backends/mybackend.py:
Register in
backends/__init__.py:
Update BackendConfigField in schema.py to handle polymorphic deserialization.
How to Add a New Frontend
Create frontend module at
frontends/myfrontend.py:
Register in
frontends/base.py:
How to Add a New Benchmark
Create benchmark module at
benchmarks/mybench.py:
Add benchmark script at
benchmarks/scripts/mybench/run.shImport in
benchmarks/__init__.pyto trigger registration:
Module Dependencies
Import Hierarchy
Circular Import Prevention
TYPE_CHECKING guard - Import type-only dependencies:
Lazy imports - Import at function call time:
Forward references - Use string annotations:
Directory Structure
Summary
srtctl is a well-architected orchestration framework with:
Clean separation of concerns: Config, runtime, backend, frontend, benchmark layers
Strong typing: Frozen dataclasses with marshmallow validation
Extensibility: Protocol-based backends/frontends, decorator-based benchmark registration
Robust process management: Registry, monitoring, graceful cleanup
SLURM integration: Proper container mounts, srun launching, nodelist parsing
Modern Python: 3.10+ syntax, comprehensive type hints, clear module structure
The codebase follows Python best practices and provides a solid foundation for orchestrating complex LLM inference workloads on SLURM clusters.
Last updated