SLURM FAQ
Cluster Compatibility Settings
Some SLURM clusters don't support certain SBATCH directives. If you encounter errors during job submission, you may need to adjust these settings in your srtslurm.yaml.
GPU Resource Specification
If you see this error when submitting jobs:
sbatch: error: Invalid generic resource (gres) specificationYour cluster doesn't support the --gpus-per-node directive. Disable it:
use_gpus_per_node_directive: falseThis will omit the #SBATCH --gpus-per-node directive from generated job scripts while keeping all other functionality intact.
Segment-Based Scheduling
If you see this error when submitting jobs:
sbatch: error: Invalid --segment specificationYour cluster doesn't support the --segment directive for topology-aware scheduling. Disable it:
use_segment_sbatch_directive: falseThe --segment directive ensures all allocated nodes are within the same network segment/switch for optimal interconnect performance between prefill and decode workers. If your cluster doesn't support it, SLURM will still allocate nodes but may scatter them across the cluster.
Last updated