Status API Spec

srtslurm can optionally report job status to an external HTTP API via fire-and-forget POST/PUT requests.

Configuration

In srtslurm.yaml or recipe YAML:

reporting:
  status:
    endpoint: "https://status.example.com"

If not configured, status reporting is disabled and jobs run normally.

Endpoints

POST /api/jobs

Create a job record. Called at submission time.

Request:

{
  "job_id": "12345",
  "job_name": "benchmark-run",
  "cluster": "gpu-cluster-01",
  "recipe": "configs/benchmark.yaml",
  "submitted_at": "2025-01-26T10:30:00Z",
  "metadata": {
    "tags": ["pipeline:98765", "suite:kv-router-comparison"]
  }
}

Response: 201 Created

PUT /api/jobs/{job_id}

Update job status. Called during execution and at completion.

Request (during execution):

Request (at completion):

All fields except status are optional.

Field
Type
Description

status

string

Required. New job status

stage

string

Current execution stage

message

string

Human-readable status message

updated_at

string

ISO 8601 timestamp (server defaults to now)

started_at

string

Job start timestamp

completed_at

string

Job completion timestamp

exit_code

int

Process exit code

logs_url

string

S3 URL where logs were uploaded

benchmark_results

object

Parsed benchmark metrics

metadata

object

Additional metadata (merged with existing)

Response: 200 OK

GET /api/jobs/{job_id}

Get full job details including event history.

GET /api/jobs

List jobs with pagination and filters.

Parameter
Type
Default
Description

page

int

1

Page number

per_page

int

50

Results per page (max 100)

status

string

-

Filter by status

cluster

string

-

Filter by cluster

Status Values

Status reflects which stage is currently executing, not readiness.

Contract Models

The canonical Pydantic models live in srtctl.contract:

Behavior

  • All requests have a 5-second timeout

  • Failures are logged at DEBUG level and ignored

  • Job execution is never blocked by status reporting failures

Last updated