SGLang Router

This page explains the sglang router mode for prefill-decode (PD) disaggregation, an alternative to the default Dynamo frontend architecture.

Overview

By default, srtctl uses Dynamo frontends to coordinate between prefill and decode workers. This requires NATS/ETCD infrastructure and the dynamo package.

SGLang Router is an alternative that uses sglang's native sglang_router for PD disaggregation.

Feature
Dynamo Frontends
SGLang Router

Infrastructure

NATS + ETCD + dynamo

sglang_router only

Routing

Dynamo's coordination

sglang's native PD routing

Scaling

nginx + multiple frontends

nginx + multiple routers

Configuration

Enable sglang router in your recipe's backend section:

backend:
  use_sglang_router: true

That's it. The workers will launch with sglang.launch_server instead of dynamo.sglang, and the router will handle request distribution.

Architecture Modes

Single Router (enable_multiple_frontends: false)

The simplest mode - one router on node 0, no nginx:

  • Router directly on port 8000

  • Good for testing or small deployments

  • No load balancing overhead

Multiple Routers (enable_multiple_frontends: true, default)

Nginx load balances across multiple router instances:

  • nginx on node 0 listens on port 8000 (public)

  • Routers listen on port 30080 (internal)

  • nginx round-robins requests to routers

  • Routers distributed across nodes using same logic as Dynamo frontends

How Router Distribution Works

The num_additional_frontends setting controls how many additional routers spawn beyond the first:

Setting
Total Routers
Distribution

num_additional_frontends: 0

1

Node 0 only

num_additional_frontends: 4

5

Node 0 + 4 distributed

num_additional_frontends: 9

10

Node 0 + 9 distributed (default)

Routers are distributed across available nodes using ceiling division:

Port Configuration

Bootstrap Port

The sglang router needs the disaggregation bootstrap port to connect to prefill workers. This must match the disaggregation-bootstrap-port in your sglang config:

The default bootstrap port is 30001 (matching most recipes). If you use a different port, ensure it's consistent across prefill and decode configs.

Server Port

Workers listen on port 30000 by default. This is standard sglang behavior and doesn't need configuration.

Complete Example

Here's a full recipe using sglang router:

Troubleshooting

Port Conflicts

If you see bind() to 0.0.0.0:8000 failed (Address already in use):

  • This means nginx and a router are both trying to use port 8000

  • Ensure you're using the latest template (routers use port 30080 internally)

Router Not Connecting to Workers

Check that:

  1. disaggregation-bootstrap-port matches in prefill/decode configs

  2. Workers are fully started before router tries to connect

  3. Network connectivity between router and worker nodes

Benchmark Can't Reach Endpoint

The benchmark connects to http://<node0>:8000. Ensure:

  • nginx is running (if enable_multiple_frontends: true)

  • Router is running (if enable_multiple_frontends: false)

  • Port 8000 is accessible

Comparison with Dynamo

Aspect
Dynamo Frontends
SGLang Router

Startup

Slower (NATS/ETCD + dynamo install)

Faster (just sglang)

Complexity

More moving parts

Simpler

Maturity

Production-tested

Newer

Config

Via dynamo.sglang

Via sglang.launch_server

Scaling

Same nginx approach

Same nginx approach

Both modes support the same enable_multiple_frontends and num_additional_frontends settings for horizontal scaling.

Last updated