Site icon GetPageSpeed

NGINX HTTP/3 Is Broken After Reload — Here’s the Fix F5 Won’t Ship

NGINX HTTP/3 Is Broken After Reload — Here's the Fix F5 Won't Ship

You reload NGINX. HTTP/2 traffic keeps flowing. But HTTP/3? Roughly half your QUIC connections silently die. No error logs. No warnings. Just timeouts that look like network issues.

This nginx HTTP/3 reload bug has existed since QUIC support was added to NGINX. F5 knows about it — there’s an open issue, multiple community PRs. They just won’t merge a fix.

We did. nginx-mod release 37 ships the patch.

The Symptoms Nobody Warned You About

Here’s what happens when you run nginx -s reload on a server with HTTP/3 enabled and quic_bpf on:

This is catastrophic for any production deployment that uses nginx -s reload for configuration changes — which is every production deployment. You push a config update, and suddenly a large fraction of your HTTP/3 users experience timeouts.

How QUIC Reuseport Works in NGINX

To understand the bug, you need to understand how NGINX handles QUIC sockets with SO_REUSEPORT.

When reuseport is specified in the listen directive, each worker process gets its own socket bound to the same address:port. The kernel distributes incoming packets across these sockets. For TCP, this is straightforward — the kernel hashes by source IP/port.

QUIC is different. A single UDP socket handles multiple logical connections, identified by Connection IDs (CIDs). NGINX uses an eBPF program (enabled by quic_bpf on) to route packets to the correct worker based on the CID embedded in the QUIC packet.

The critical piece is in src/event/quic/ngx_event_quic_bpf.c. When the BPF program is attached to a reuseport group, NGINX sets a flag on the listening socket:

    /* do not inherit this socket */
    ls->ignore = 1;

This ignore = 1 flag tells the NGINX reload mechanism: “Don’t pass this socket to the new worker processes. Create fresh sockets instead.” The idea is sound — new workers need new sockets with a fresh BPF map that routes CIDs to the correct new worker PIDs.

The problem is what happens to the old sockets.

The Root Cause: An 8-Line Oversight

During graceful shutdown, NGINX calls ngx_close_listening_sockets() in src/core/ngx_connection.c. This function closes listening sockets so the old worker stops accepting new connections while it finishes processing existing ones.

Here’s what the upstream code looks like:

void
ngx_close_listening_sockets(ngx_cycle_t *cycle)
{
    ngx_uint_t         i;
    ngx_listening_t   *ls;
    ngx_connection_t  *c;

    /* ... */

    ls = cycle->listening.elts;
    for (i = 0; i < cycle->listening.nelts; i++) {

#if (NGX_QUIC)
        if (ls[i].quic) {
            continue;      /* <-- THE BUG */
        }
#endif

        c = ls[i].connection;

        if (c) {
            if (c->read->active) {
                /* ... delete event ... */
            }
            ngx_free_connection(c);
            c->fd = (ngx_socket_t) -1;
        }

        if (ngx_close_socket(ls[i].fd) == -1) {
            /* ... error ... */
        }
    }
}

See it? if (ls[i].quic) { continue; }all QUIC listening sockets are unconditionally skipped. They are never closed during graceful shutdown.

For non-reuseport QUIC sockets, this makes sense. The old worker needs to keep its QUIC socket open to finish servicing existing QUIC connections (sending retransmissions, processing ACKs, completing graceful connection shutdown).

But for reuseport QUIC sockets with BPF, it’s a disaster. Here’s why:

  1. Reload happens. New worker processes start with fresh sockets and a fresh BPF map.
  2. Old workers enter graceful shutdown but their reuseport sockets stay open.
  3. The kernel’s reuseport group now contains both old and new sockets. For new QUIC Initial packets (which have no CID in the BPF map yet), the kernel falls back to hash-based distribution across all sockets in the reuseport group.
  4. New Initial packets land on old worker sockets ~(N-1)/N of the time (where N is the total number of workers, old + new). The old workers are in ngx_exiting state and silently drop these packets.
  5. The client sees a timeout. No QUIC handshake completes. No error. Just silence.

This is not a complex race condition. It is a simple oversight — QUIC sockets are skipped without checking whether they use reuseport. Any code review should have caught this.

Reproducing the Bug

You can reproduce this in minutes on any Linux system with NGINX built with QUIC support.

Configuration:

worker_processes 2;

events {
    worker_connections 1024;
}

http {
    quic_bpf on;

    server {
        listen 443 quic reuseport;
        listen 443 ssl;

        ssl_certificate     /etc/ssl/certs/example.crt;
        ssl_certificate_key /etc/ssl/private/example.key;

        location / {
            return 200 "OK\n";
        }
    }
}

Test script:

#!/bin/bash
# Fresh restart — baseline
sudo nginx -s stop && sudo nginx
echo "=== After restart ==="
for i in $(seq 1 20); do
    curl -s --http3-only -m 2 https://localhost/ > /dev/null 2>&1 && echo "OK" || echo "FAIL"
done | sort | uniq -c

# Reload — triggers the bug
sudo nginx -s reload
sleep 1
echo "=== After reload ==="
for i in $(seq 1 20); do
    curl -s --http3-only -m 2 https://localhost/ > /dev/null 2>&1 && echo "OK" || echo "FAIL"
done | sort | uniq -c

Typical output with unpatched NGINX:

=== After restart ===
     20 OK
=== After reload ===
     10 FAIL
     10 OK

20/20 after restart. ~10/20 after reload. The failure rate is consistent and predictable.

You can verify the stale sockets using system tools:

# Show reuseport sockets — old worker PIDs still present
ss -ulnp sport = :443

# Show BPF map entries — stale entries pointing to old worker sockets
bpftool map dump name ngx_quic_sockmap

After reload, you’ll see UDP sockets owned by both old (exiting) and new worker PIDs in the reuseport group. The BPF map only knows about new workers, so Initial packets (with no CID mapping) hash-distribute across all sockets — including the dead ones.

F5’s Response — Or Lack Thereof

This bug is tracked in the nginx issue tracker. It’s been open since 2025. The community hasn’t just reported it — they’ve submitted actual fixes:

The NGINX team’s response? “We are planning to finish and commit our fix one day.” No timeline. No priority. No urgency for a bug that silently drops production traffic.

Meanwhile, Angie — the Russian fork of NGINX — fixed this in version 1.11.0 (December 2025) with a complete BPF redesign. A community fork, with fewer resources than F5, shipped a production fix months ago.

This is a pattern. Since F5 acquired NGINX in 2019, open-source NGINX has become a neglected vehicle for selling NGINX Plus. Critical bugs in the open-source version languish for months or years. Community contributions go unreviewed. The message is clear: if you want fixes, buy the commercial product.

We think there’s a better path.

Our Fix: Close Stale Reuseport Sockets

The fix is surgical. In ngx_close_listening_sockets(), instead of skipping all QUIC sockets, we only skip non-reuseport ones:

 #if (NGX_QUIC)
         if (ls[i].quic) {
-            continue;
+            if (!ls[i].reuseport) {
+                continue;
+            }
+
+            /*
+             * Close QUIC reuseport sockets to remove the exiting worker
+             * from the reuseport group, preventing new QUIC connections
+             * from being routed to this worker during graceful shutdown.
+             */
         }
 #endif

That’s it. When a QUIC listening socket has reuseport enabled, it gets closed during graceful shutdown — just like every other listening socket. This removes the old worker’s socket from the kernel’s reuseport group immediately. All new QUIC Initial packets now route exclusively to new worker sockets.

Non-reuseport QUIC sockets continue to be kept open, preserving the ability for old workers to finish servicing existing QUIC connections.

Why this works:

Before and After

We tested with 5 consecutive reloads, 20 HTTP/3 requests each:

Scenario Restart Reload 1 Reload 2 Reload 3 Reload 4 Reload 5
Unpatched 20/20 10/20 9/20 11/20 10/20 10/20
Patched (nginx-mod) 20/20 20/20 20/20 20/20 20/20 20/20

100% success rate across all reloads with the patch. The fix is deterministic — not a timing improvement, but a complete elimination of the failure mode.

Get the Fix Now with nginx-mod

nginx-mod is a better NGINX: community-driven, actively patched, with fixes that upstream ignores. This QUIC reload fix ships in release 37.

Install on RHEL, CentOS, AlmaLinux, Rocky Linux, Fedora, or SUSE:

sudo dnf -y install https://extras.getpagespeed.com/release-latest.rpm
sudo dnf -y install nginx-mod

Or if you’re upgrading from stock NGINX:

sudo dnf -y swap nginx nginx-mod

Then verify HTTP/3 works after reload:

sudo nginx -s reload
curl --http3-only -I https://your-domain.com

This isn’t the first upstream bug we’ve fixed. nginx-mod also patches the empty $http_host bug in HTTP/3 that upstream shipped a partial fix for months later.

Subscribe to the GetPageSpeed repository for ongoing access to nginx-mod and 1,000+ other RPM packages.

Workaround Without nginx-mod

If you can’t switch to nginx-mod, your options are limited:

Option 1: Disable quic_bpf

# Remove or comment out:
# quic_bpf on;

Without quic_bpf, NGINX doesn’t set ls->ignore = 1, so sockets are inherited normally during reload rather than being recreated. The kernel distributes QUIC packets across inherited sockets, and both old and new workers can handle them.

The downside: you lose CID-based BPF routing. Without it, QUIC packets for existing connections may land on the wrong worker after reload, breaking those connections. You’re trading “new connections fail” for “existing connections break.” Not a real solution — just a different flavor of broken.

Option 2: Use nginx -s stop && nginx instead of reload

A full restart avoids the bug entirely since there are no old workers with stale sockets. But you lose all active connections (HTTP/1.1, HTTP/2, and QUIC) during the restart window. Unacceptable for production.

Option 3: Apply the patch yourself

Download the patch and rebuild NGINX from source. If you’re already building from source, this is straightforward. If you’re using distro packages, this is a maintenance burden you probably don’t want.

Who Is Affected

You are affected if all of these are true:

If you’re running HTTP/3 in production on Linux, you almost certainly have all of these. The quic_bpf on directive is recommended in every QUIC deployment guide, and reuseport is required for multi-worker QUIC.

Wrapping Up

This bug silently drops production HTTP/3 traffic every time you reload NGINX. It’s been known and reported for over a year. Community fixes exist but go unreviewed. F5 has the resources to fix this in an afternoon — they choose not to.

nginx-mod exists because the open-source NGINX community deserves a build that actually ships fixes. Get it today.

D

Danila Vershinin

Founder & Lead Engineer

NGINX configuration and optimizationLinux system administrationWeb performance engineering

10+ years NGINX experience • Maintainer of GetPageSpeed RPM repository • Contributor to open-source NGINX modules

Exit mobile version