Skip to main content

NGINX / Server Setup

NGINX slow_start: Gradual Upstream Ramp-Up Without Plus

by ,


We have by far the largest RPM repository with NGINX module packages and VMODs for Varnish. If you want to install NGINX, Varnish, and lots of useful performance/security software with smooth yum upgrades for production use, this is the repository for you.
Active subscription is required.

Every rolling restart slams cold backends at full weight, spiking p99 latency. The fix is nginx slow_start, a feature NGINX Plus has shipped since 2015 but open-source NGINX never wired up. NGINX-MOD implements it end-to-end, with measured ramp-ups that track the theoretical linear curve within 0.5%.

You rolled a backend server back into production after a restart. Within seconds your dashboards lit up: connection pools exhausted, 502s climbing, tail latency doubled, other servers in the pool drowning in retry storms. The restart “worked” – the TCP listener came up, the health check flipped green – but NGINX round-robined the fresh box at full weight from request number one. A cold JVM, an empty PHP opcache, a warming connection pool, every CPU cache line invalid – and you just handed that box 50% of production traffic. This is the backend slam after restart problem, and the fix for it has a name: nginx slow_start.

Until now, nginx slow_start has been locked behind the NGINX Plus paywall. NGINX Plus has shipped a server slow_start parameter since 2015 that ramps traffic to a freshly-live upstream from zero to full weight over a configurable window. Open-source NGINX does not implement the feature. F5 keeps nginx slow_start behind the $3,675-per-instance paywall. NGINX-MOD, the drop-in nginx build from GetPageSpeed, implements nginx slow_start natively inside the open-source round-robin scheduler. This article shows you what NGINX-MOD’s nginx slow_start implementation does, why F5 stubbed the field but never wired the scheduler, how to configure nginx slow_start in your upstream blocks, and the exact measurements from a ramp-up test in a fresh Rocky Linux VM.

The Backend Slam Problem

Every time a load-balanced upstream peer comes online – whether after a deploy, an auto-scaling event, a process restart, or recovery from a failed health check – that peer is cold. Cold in a lot of places at once:

  • Process-level warm-up. JVM JIT compilers haven’t profiled the hot paths yet. PHP opcache is empty. Python import caches are cold. Go goroutine schedulers are at their initial steady state.
  • Connection-level warm-up. Keep-alive pools to downstream databases, caches, and services are empty. Every first request pays a full TCP handshake + TLS handshake + authentication round-trip.
  • OS-level warm-up. Linux page cache for executable segments, config files, static assets – all cold. First file reads hit disk.
  • Network-level warm-up. TCP congestion windows are at initial values. Local DNS caches are empty.

Hit a server that cold with full production traffic, and first-hundred-request latency is measured in seconds, not milliseconds. Some of those requests time out. Those that don’t time out upstream of NGINX pile up, blocking worker threads that would otherwise serve other requests. Tail latency spikes. In the worst case, the cold server’s connection pool to the database saturates, it stops responding to health checks, NGINX marks it down, and you’ve achieved the exact opposite of the capacity-add you were hoping for.

The fix is obvious in theory: send the fresh peer a tiny fraction of traffic first, let it warm up, gradually increase its share until it’s pulling its full weight. NGINX Plus calls this slow_start and documents it as a parameter of the server directive in upstream {}. It is precisely the nginx slow_start behaviour that NGINX-MOD brings to open-source NGINX as a fully-implemented feature.

Why Open-Source NGINX Has the Field But Not the Feature

If you grep the open-source nginx source tree for slow_start, you will find something strange:

grep -rn "slow_start" nginx/src
src/http/ngx_http_upstream.h:103:    ngx_msec_t                       slow_start;
src/http/ngx_http_upstream_round_robin.h:55:    ngx_msec_t                      slow_start;
src/stream/ngx_stream_upstream.h:61:    ngx_msec_t                         slow_start;
src/stream/ngx_stream_upstream_round_robin.h:55:    ngx_msec_t                       slow_start;

Four declarations. Four. And zero references in any .c file. F5 added the slow_start field to the upstream peer struct in open-source nginx – presumably so that binary-compatibility shims between Plus and OSS wouldn’t break when Plus writes state files – but they never shipped the scheduler logic that actually uses the field. The field sits there, silently unread, forever reserved for a feature you have to pay to unlock.

That’s a peculiar kind of technical dark pattern: the data model is right there, the round-robin scheduler is right there, and the four lines of ramp-up math that connect them are the only thing missing. Anyone with the source can trace the gap in ten minutes.

We did. NGINX-MOD implements nginx slow_start as a first-class feature of its build, with full scheduler integration.

How NGINX-MOD Implements nginx slow_start

NGINX-MOD’s nginx slow_start implementation adds three missing pieces to the open-source round-robin scheduler:

  1. Parser acceptance. ngx_http_upstream_server() now recognizes slow_start=<time> as a valid parameter on the server directive inside upstream {}, parses the time value via ngx_parse_time(), and stores it on the server config.

  2. Peer initialization. Every site where a peer struct is initialized from a server config – the four paths inside ngx_http_upstream_init_round_robin() plus the two paths inside ngx_http_upstream_zone_module.c – now propagates server->slow_start onto the peer and stamps peer->start_time = ngx_current_msec as the ramp-up anchor. The start_time field was, like slow_start itself, already declared in the peer struct but never assigned anywhere in the open-source scheduler – so reusing it is ABI-safe: no struct layout changes, no broken third-party dynamic modules.

  3. Scheduler ramp-up. Inside ngx_http_upstream_get_peer(), each peer’s effective weight is now scaled by a 0-to-100 factor derived from elapsed-time-since-start divided by the configured slow_start window. A freshly-live peer with slow_start=30s receives zero attenuation factor at t=0, linearly increasing to 100 at t=30s, and is never scaled again after that (the anchor is cleared on first completion).

The math is simple:

if (peer->slow_start && peer->start_time) {
    elapsed = ngx_current_msec - peer->start_time;

    if (elapsed >= peer->slow_start) {
        peer->start_time = 0;  /* ramp-up complete, stop scaling */
    } else if (elapsed <= 0) {
        factor = 0;
    } else {
        factor = elapsed * 100 / peer->slow_start;
    }
}

effective_weight = peer->effective_weight * factor;

Because the factor applies to every peer (value 100 for peers not in ramp-up), the round-robin accumulator stays internally consistent: no fractional-weight glitches, no starvation, no stuck peers.

Installing NGINX-MOD

nginx slow_start support ships with every nginx-mod package version 1.30.0-44 and later. NGINX-MOD is a drop-in replacement for the stock nginx package – same config file layout, same systemd unit, same module directory.

RHEL, CentOS, AlmaLinux, Rocky Linux

Install the GetPageSpeed extras release package, activate the NGINX-MOD sub-repo, and swap:

sudo dnf install https://extras.getpagespeed.com/release-latest.rpm
sudo dnf config-manager --enable getpagespeed-extras-nginx-mod
sudo dnf swap nginx nginx-mod

If you are not yet running stock nginx from the GetPageSpeed repo, dnf install nginx-mod works the same way.

Verify the build:

nginx -v
# nginx version: nginx-mod by GetPageSpeed.com/1.30.0

Debian and Ubuntu

First, set up the GetPageSpeed APT repository, then install:

sudo apt-get update
sudo apt-get install nginx-mod

On Debian/Ubuntu, the package handles module loading automatically. No load_module directive is needed.

NGINX-MOD is included in every GetPageSpeed repository subscription starting at $10/month, alongside 100+ dynamic modules, NGINX Plus API parity via the bundled API module, and active health checks.

Configuring nginx slow_start

The slow_start=<time> parameter goes on the server directive inside an upstream {} block. The time accepts any suffix NGINX understands: s, m, h, d.

upstream app_backend {
    server 10.0.0.1:8080 weight=10;
    server 10.0.0.2:8080 weight=10 slow_start=30s;
    server 10.0.0.3:8080 weight=10 slow_start=1m;
}

server {
    listen 80;
    location / {
        proxy_pass http://app_backend;
    }
}

Three things worth calling out about nginx slow_start semantics:

  • Each peer has its own ramp. Peer 2 ramps over 30 seconds; peer 3 ramps over 60 seconds; peer 1 doesn’t ramp at all. Mix and match per the warm-up profile of each backend.
  • Timer starts when the peer is created. For static upstream {} servers, that’s when NGINX starts or reloads. For servers added at runtime via the NGINX dynamic upstream API, the timer starts the moment the server is POSTed to the API.
  • slow_start is compatible with weight, max_fails, fail_timeout, max_conns, backup, and down. Full parameter interoperability – no hidden incompatibilities.

Choosing the Right Window

Pick slow_start to match your backend’s actual warm-up time – the time from “health check passes” to “steady-state p99 latency.” As a starting point:

Backend type Suggested slow_start
Static file server / reverse proxy with no state 0s (not needed)
PHP-FPM with OPcache 15s to 30s
Node.js / Go with warm connection pools 10s to 30s
Java/JVM with JIT + connection pool 60s to 120s
.NET with CLR JIT 45s to 90s
Large Python app with import graph 30s to 60s

If you do not know the number, start at 30s and instrument your backend’s latency during a canary restart. The right value is slightly longer than your p99 latency takes to converge to its steady-state value.

Runtime Verification: The Ramp-Up Actually Ramps

A correctly-implemented nginx slow_start feature must produce a smooth, monotonic ramp from near-zero traffic at t=0 to full share at t=slow_start. An implementation that only parses the directive without wiring the scheduler would produce a flat 50/50 distribution from the first request. Therefore, any responsible write-up has to measure the actual distribution, not just check nginx -t.

The following test ran inside a fresh Rocky Linux 10 VM with nginx-mod-1.30.0 containing NGINX-MOD’s fully-implemented scheduler. Two upstream peers, both weight 1, peer B with slow_start=30s:

upstream backend_test {
    server 127.0.0.1:8001 weight=1;
    server 127.0.0.1:8002 weight=1 slow_start=30s;
}

server { listen 8001; location / { return 200 "A\n"; } }
server { listen 8002; location / { return 200 "B\n"; } }

server {
    listen 127.0.0.1:8080;
    location / { proxy_pass http://backend_test; }
}

After systemctl restart nginx, at each sample point we send 200 requests to the proxy and count the A/B split:

for i in $(seq 1 200); do curl -s http://127.0.0.1:8080/; done \
    | sort | uniq -c

Results:

t (seconds after restart) Peer A count Peer B count B share Theoretical B share
1s 193 7 3.5% 3.3%
5s 171 29 14.5% 14.3%
10s 150 50 25.0% 25.0%
15s 133 67 33.5% 33.3%
20s 120 80 40.0% 40.0%
25s 109 91 45.5% 45.5%
29s 101 99 49.5% 49.2%
31s 100 100 50.0% 50.0% (steady state)
35s 100 100 50.0% 50.0% (steady state)

The measured nginx slow_start ramp tracks the theoretical linear ramp within half a percent at every sample point. After t=slow_start, the split stays at the equal steady state forever. No oscillation, no overshoot, no starvation.

For comparison, a control run against an identical upstream with the slow_start parameter removed produces an immediate 50/50 split from request number one, confirming that stock round-robin without slow_start is unchanged:

t=0s (plain round-robin, no slow_start):  50 A  50 B

Three-peer production shape

The three-peer example from the “Configuring” section above was exercised against the same VM, binding the three 10.0.0.x addresses to local ports 127.0.0.1:8001-8003 for the test. Traffic shares of 300 requests per sample, with peer B at slow_start=30s and peer C at slow_start=1m:

t (seconds after restart) A B C
1s 287 9 4
15s 173 85 42
30s 120 120 60
45s 109 110 81
60s 100 100 100

Peer B reaches its full share at t=30s (half of C’s window), peer C reaches full share at t=60s, and all three peers share equally thereafter. slow_start=1m is accepted by the parser and respected by the scheduler.

Dynamic upstream API

The dynamic upstream API path was also verified. With a single-peer upstream that has a zone block, POSTing a new server with slow_start=30s via the api module:

curl -X POST -H "Content-Type: application/json" \
  -d '{"server":"127.0.0.1:8002","slow_start":"30s"}' \
  http://127.0.0.1:8080/api/1/http/upstreams/api_backend/servers/

Produces the same monotonic ramp as a static upstream {} configuration (3.5% / 25% / 40% / 50% at t=1s / 10s / 20s / 30s+), confirming that the start_time stamp is applied when the API creates the peer, not only at nginx startup.

Vanilla nginx rejection

The “invalid parameter” message in the troubleshooting section was verified against a hand-compiled vanilla nginx 1.30.0 tree. Presenting the slow_start config to the vanilla binary produces:

nginx: [emerg] invalid parameter "slow_start=30s" in /etc/nginx/conf.d/slow-start-test.conf:2
nginx: configuration file /etc/nginx/nginx.conf test failed

Confirming the feature is observable only in NGINX-MOD’s implementation.

Directive Context Cross-Check

Because slow_start=<time> is a parameter of the server directive – not a standalone directive – it inherits the server directive’s context: NGX_HTTP_UPS_CONF. It must appear inside an upstream { } block, attached to a server line. The table below confirms every directive touched by the config snippets above.

Directive Context flags (from source) Block in article Matches?
upstream NGX_HTTP_MAIN_CONF http (top-level of http {}) Yes
server (upstream) NGX_HTTP_UPS_CONF upstream Yes
server (virtual host) NGX_HTTP_MAIN_CONF http Yes
listen NGX_HTTP_SRV_CONF server Yes
location NGX_HTTP_SRV_CONF or NGX_HTTP_LOC_CONF server Yes
return many (incl. NGX_HTTP_LOC_CONF) location Yes
proxy_pass NGX_HTTP_LOC_CONF family location Yes

Attempting to put slow_start=30s on a listen line, or a top-level slow_start 30s; directive, will fail with unknown parameter – because there is no standalone slow_start directive, only the server parameter.

Performance Considerations

When nginx slow_start Helps Most

  • Any backend with a measurable cold-start cost. JVM-based services, PHP-FPM with OPcache, large Node.js applications with module graph warmup, .NET with CLR JIT.
  • Backends with connection pooling. Empty DB connection pools, empty cache client pools, empty service-mesh sidecars all benefit from gradually warming through incoming traffic.
  • Deploys in front of an auto-scaler. Every scale-up event adds a cold peer. Without slow_start, the new instance takes the full share immediately; with slow_start, it ramps while its JIT, OPcache, and pool warm up.
  • Rolling restarts. During a rolling restart, the peer coming back online is reintroduced at its full weight. slow_start turns each reintroduction into a soft ramp, smoothing the tail-latency profile of the whole deployment.

When nginx slow_start Does Not Help

  • Static file servers. Nothing to warm up; the page cache is the only state, and it warms passively from background traffic.
  • Reverse proxies in front of already-warm pools. If your “backend” is itself an idle NGINX or HAProxy, it’s already warm.
  • Backends that pre-warm during startup. If your deploy script explicitly hits /__warmup before adding the server to the pool, slow_start duplicates the work (harmless, but unnecessary).
  • Health-check-first architectures. If your health check already requires the backend to respond with p99 < Xms for N consecutive probes before being added to rotation, slow_start provides additional margin but the primary warm-up is already done.

Overhead

The per-request overhead of the ramp-up math is a single timestamp subtraction, a comparison, and a multiply – measured in nanoseconds. For peers whose ramp-up has completed (peer->start_time == 0), the guard clause short-circuits after a single zero-check. Running NGINX-MOD with nginx slow_start configured on every peer adds no measurable overhead at steady state.

Troubleshooting

“invalid parameter slow_start=30s”

You are running stock open-source nginx, not NGINX-MOD. Stock nginx’s upstream parser does not recognize slow_start= and will emit this error during nginx -t. Swap to NGINX-MOD:

sudo dnf config-manager --enable getpagespeed-extras-nginx-mod
sudo dnf swap nginx nginx-mod

Verify:

nginx -v
# Expected: nginx-mod by GetPageSpeed.com/...

nginx slow_start does not seem to do anything

Two common causes:

  1. The ramp-up window is shorter than the time it takes to send your first test request. If slow_start=1s and your test harness takes 800 ms to warm up, you’ll miss the ramp-up window entirely. Bump to slow_start=30s and retest.

  2. You are testing with a single peer. slow_start attenuates a peer’s share relative to other peers. With one peer, there is nothing to share with; round-robin sends 100% of traffic to the only available peer regardless of its ramp-up factor. Test with at least two peers.

Ramp-up resets after every reload

Expected behavior. Each nginx -s reload builds a fresh upstream peer list and re-stamps start_time on every peer with slow_start. If you want peers that have already warmed up to stay warm across reloads, use the NGINX dynamic upstream API with state persistence; the state file preserves the peer list across reloads, and peers that have finished their ramp-up do not re-ramp.

Does nginx slow_start apply per-worker or globally

It works across all workers consistently. Each worker maintains its own copy of the peer state in process-local memory, and each worker reads ngx_current_msec from the same clock source. Because all workers are forked from the same master and share the same initial start_time stamp, every worker’s view of ramp-up progress is monotonically consistent. You will see the same ramp curve whether you poll worker 1 or worker 16.

Does slow_start work with the dynamic upstream API

Yes. When you POST a new server with "slow_start": "30s", NGINX-MOD stamps the peer’s start_time at the moment of POST, and the scheduler begins ramp-up from that instant. PATCHing slow_start on an existing server also re-stamps start_time, effectively restarting the ramp – useful for simulating a restart without actually restarting. See the NGINX dynamic upstream guide for REST-level details.

The Bigger Picture: Closing the NGINX Plus Gap

nginx slow_start is the latest feature NGINX-MOD ships that open-source NGINX does not. The pattern is becoming familiar:

  • HTTP/3 reload – Open-source NGINX loses QUIC connections on every reload. NGINX-MOD implements reuseport-safe shutdown so existing connections survive.
  • Active health checks – Open-source NGINX only has passive health detection. NGINX-MOD implements six active-check types (HTTP, TCP, SSL, MySQL, AJP, FastCGI) with a dashboard.
  • Dynamic upstream API – Open-source NGINX has no runtime upstream management. NGINX-MOD implements an NGINX Plus-compatible API for add/remove/modify without reload.
  • Upstream slow_start – This article.

Every one of these features follows the same shape: F5 stubbed the data model into open-source nginx, then declined to implement the behavior. We trace the gap, engineer the missing scheduler logic, verify the runtime behavior, and ship it as part of a standard repo subscription. A 10-instance shop pays Plus $36,750 per year; the same shop pays GetPageSpeed under $500 per year – and gets nginx slow_start, every active health check type, the dynamic upstream API, HTTP/3-safe reloads, Brotli, ModSecurity, 100+ other modules, and every future NGINX version for as long as the subscription runs.

Wrapping Up

nginx slow_start matters more than most sysadmins realize. Every rolling restart, every auto-scale event, every blue-green cutover punishes your backends with a slam of cold traffic – and each of those slams is a measurable p99 tail spike that rolls up into your SLO math. Giving each fresh peer a 30-second runway eliminates the slam entirely, at zero additional complexity in the config file.

Open-source nginx has had the slow_start field in its struct for years and never wired the scheduler. F5 stubbed the data model and kept the behavior behind the Plus paywall. NGINX-MOD implements the missing feature end-to-end, ships it through the standard dynamic module repo, and bundles nginx slow_start with the same flat subscription that covers every other NGINX-MOD feature.

Install NGINX-MOD, add slow_start=30s to the peers that warm up slowly, and restart. Watch your rolling-deploy tail latency stop bouncing.

Get NGINX-MOD from the GetPageSpeed repository, browse the full NGINX-MOD feature set, and pair nginx slow_start with the dynamic upstream API for runtime warm-up control.

D

Danila Vershinin

Founder & Lead Engineer

NGINX configuration and optimizationLinux system administrationWeb performance engineering

10+ years NGINX experience • Maintainer of GetPageSpeed RPM repository • Contributor to open-source NGINX modules

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.