NGINX HTTP/3 Is Broken After Reload — Here’s the Fix F5 Won’t Ship

Danila Vershinin

3 months ago

NGINX HTTP/3 Is Broken After Reload — Here's the Fix F5 Won't Ship

📅 Updated: June 7, 2026 (Originally published: March 13, 2026)

You reload NGINX. HTTP/2 traffic keeps flowing. But HTTP/3? Roughly half your QUIC connections silently die. No error logs. No warnings. Just timeouts that look like network issues.

This nginx HTTP/3 reload bug has existed since QUIC support was added to NGINX. F5 knows about it — there’s an open issue, multiple community PRs. They just won’t merge a fix.

We did. NGINX-MOD release 37 ships the patch. And due to the importance of this fix, we now ship it in our standard nginx package as well.

The Symptoms Nobody Warned You About

Here’s what happens when you run nginx -s reload on a server with HTTP/3 enabled and quic_bpf on:

QUIC handshakes silently fail. The client sends an Initial packet, the server never responds. Zero bytes received on the client side.
curl --http3-only times out approximately 50% of the time after reload. Works perfectly after a full restart.
No error in NGINX logs. Nothing in error.log. Nothing at any log level. The packets simply vanish.
The failure rate scales with worker_processes. With 2 workers, ~50% of new connections fail. With 4 workers, ~75%. With 8, ~87.5%. The math is simple: only 1 out of N workers is the new one that can actually handle new QUIC connections.

This is catastrophic for any production deployment that uses nginx -s reload for configuration changes — which is every production deployment. You push a config update, and suddenly a large fraction of your HTTP/3 users experience timeouts.

How QUIC Reuseport Works in NGINX

To understand the bug, you need to understand how NGINX handles QUIC sockets with SO_REUSEPORT.

When reuseport is specified in the listen directive, each worker process gets its own socket bound to the same address:port. The kernel distributes incoming packets across these sockets. For TCP, this is straightforward — the kernel hashes by source IP/port.

QUIC is different. A single UDP socket handles multiple logical connections, identified by Connection IDs (CIDs). NGINX uses an eBPF program (enabled by quic_bpf on) to route packets to the correct worker based on the CID embedded in the QUIC packet.

The critical piece is in src/event/quic/ngx_event_quic_bpf.c. When the BPF program is attached to a reuseport group, NGINX sets a flag on the listening socket:

    /* do not inherit this socket */
    ls->ignore = 1;

This ignore = 1 flag tells the NGINX reload mechanism: “Don’t pass this socket to the new worker processes. Create fresh sockets instead.” The idea is sound — new workers need new sockets with a fresh BPF map that routes CIDs to the correct new worker PIDs.

The problem is what happens to the old sockets.

The Root Cause: An 8-Line Oversight

During graceful shutdown, NGINX calls ngx_close_listening_sockets() in src/core/ngx_connection.c. This function closes listening sockets so the old worker stops accepting new connections while it finishes processing existing ones.

Here’s what the upstream code looks like:

void
ngx_close_listening_sockets(ngx_cycle_t *cycle)
{
    ngx_uint_t         i;
    ngx_listening_t   *ls;
    ngx_connection_t  *c;

    /* ... */

    ls = cycle->listening.elts;
    for (i = 0; i < cycle->listening.nelts; i++) {

#if (NGX_QUIC)
        if (ls[i].quic) {
            continue;      /* <-- THE BUG */
        }
#endif

        c = ls[i].connection;

        if (c) {
            if (c->read->active) {
                /* ... delete event ... */
            }
            ngx_free_connection(c);
            c->fd = (ngx_socket_t) -1;
        }

        if (ngx_close_socket(ls[i].fd) == -1) {
            /* ... error ... */
        }
    }
}

See it? if (ls[i].quic) { continue; } — all QUIC listening sockets are unconditionally skipped. They are never closed during graceful shutdown.

For non-reuseport QUIC sockets, this makes sense. The old worker needs to keep its QUIC socket open to finish servicing existing QUIC connections (sending retransmissions, processing ACKs, completing graceful connection shutdown).

But for reuseport QUIC sockets with BPF, it’s a disaster. Here’s why:

Reload happens. New worker processes start with fresh sockets and a fresh BPF map.
Old workers enter graceful shutdown but their reuseport sockets stay open.
The kernel’s reuseport group now contains both old and new sockets. For new QUIC Initial packets (which have no CID in the BPF map yet), the kernel falls back to hash-based distribution across all sockets in the reuseport group.
New Initial packets land on old worker sockets ~(N-1)/N of the time (where N is the total number of workers, old + new). The old workers are in ngx_exiting state and silently drop these packets.
The client sees a timeout. No QUIC handshake completes. No error. Just silence.

This is not a complex race condition. It is a simple oversight — QUIC sockets are skipped without checking whether they use reuseport. Any code review should have caught this.

Reproducing the Bug

You can reproduce this in minutes on any Linux system with NGINX built with QUIC support.

Configuration:

worker_processes 2;
quic_bpf on;

events {
    worker_connections 1024;
}

http {
    server {
        listen 443 quic reuseport;
        listen 443 ssl;

        ssl_certificate     /etc/ssl/certs/example.crt;
        ssl_certificate_key /etc/ssl/private/example.key;

        location / {
            return 200 "OKn";
        }
    }
}

Note that quic_bpf is a top-level (main context) directive — it sits alongside worker_processes at the very top of nginx.conf, not inside the http { } block.

Test script:

#!/bin/bash
# Fresh restart — baseline
sudo nginx -s stop && sudo nginx
echo "=== After restart ==="
for i in $(seq 1 20); do
    curl -s --http3-only -m 2 https://localhost/ > /dev/null 2>&1 && echo "OK" || echo "FAIL"
done | sort | uniq -c

# Reload — triggers the bug
sudo nginx -s reload
sleep 1
echo "=== After reload ==="
for i in $(seq 1 20); do
    curl -s --http3-only -m 2 https://localhost/ > /dev/null 2>&1 && echo "OK" || echo "FAIL"
done | sort | uniq -c

Typical output with unpatched NGINX:

=== After restart ===
     20 OK
=== After reload ===
     10 FAIL
     10 OK

20/20 after restart. ~10/20 after reload. The failure rate is consistent and predictable.

You can verify the stale sockets using system tools:

# Show reuseport sockets — old worker PIDs still present
ss -ulnp sport = :443

# Show BPF map entries — stale entries pointing to old worker sockets
bpftool map dump name ngx_quic_sockmap

After reload, you’ll see UDP sockets owned by both old (exiting) and new worker PIDs in the reuseport group. The BPF map only knows about new workers, so Initial packets (with no CID mapping) hash-distribute across all sockets — including the dead ones.

F5’s Response — Or Lack Thereof

This bug is tracked in the nginx issue tracker. It’s been open since 2025. The community hasn’t just reported it — they’ve submitted actual fixes:

PR #503 — A comprehensive fix implementing Magic CID + Retry mechanism. Unreviewed.
PR #298 — An earlier approach. Closed without merge.

The NGINX team’s response? “We are planning to finish and commit our fix one day.” No timeline. No priority. No urgency for a bug that silently drops production traffic.

Meanwhile, Angie — the Russian fork of NGINX — fixed this in version 1.11.0 (December 2025) with a complete BPF redesign. A community fork, with fewer resources than F5, shipped a production fix months ago.

This is a pattern. Since F5 acquired NGINX in 2019, open-source NGINX has become a neglected vehicle for selling NGINX Plus. Critical bugs in the open-source version languish for months or years. Community contributions go unreviewed. The message is clear: if you want fixes, buy the commercial product.

We think there’s a better path.

Our Fix: Close Stale Reuseport Sockets

The fix is surgical. In ngx_close_listening_sockets(), instead of skipping all QUIC sockets, we only skip non-reuseport ones:

 #if (NGX_QUIC)
         if (ls[i].quic) {
-            continue;
+            if (!ls[i].reuseport) {
+                continue;
+            }
+
+            /*
+             * Close QUIC reuseport sockets to remove the exiting worker
+             * from the reuseport group, preventing new QUIC connections
+             * from being routed to this worker during graceful shutdown.
+             */
         }
 #endif

That’s it. When a QUIC listening socket has reuseport enabled, it gets closed during graceful shutdown — just like every other listening socket. This removes the old worker’s socket from the kernel’s reuseport group immediately. All new QUIC Initial packets now route exclusively to new worker sockets.

Non-reuseport QUIC sockets continue to be kept open, preserving the ability for old workers to finish servicing existing QUIC connections.

Why this works:

With quic_bpf on, NGINX already sets ls->ignore = 1, which means each worker creates its own fresh socket. Old and new workers don’t share sockets.
Existing QUIC connections on the old worker use CID-based routing through the BPF map. But the old worker’s BPF map entries were already invalidated when the new workers started with a fresh map. So the old worker’s reuseport socket isn’t serving existing connections either — it’s purely dead weight in the reuseport group.
Closing it is not just safe; it’s the only correct behavior.

Before and After

We tested with 5 consecutive reloads, 20 HTTP/3 requests each:

Scenario	Restart	Reload 1	Reload 2	Reload 3	Reload 4	Reload 5
Unpatched	20/20	10/20	9/20	11/20	10/20	10/20
Patched (NGINX-MOD)	20/20	20/20	20/20	20/20	20/20	20/20

100% success rate across all reloads with the patch. The fix is deterministic — not a timing improvement, but a complete elimination of the failure mode.

Get the Fix Now

NGINX-MOD is a better NGINX: community-driven, actively patched, with fixes that upstream ignores. This QUIC reload fix ships in release 37.

Due to the severity of this bug, we now also ship this fix in our standard nginx package. If you’re already using our repository, a simple update is all you need:

sudo dnf -y install https://extras.getpagespeed.com/release-latest.rpm
sudo dnf -y update nginx

Or install NGINX-MOD for all our patches and improvements:

sudo dnf -y install nginx-mod

If you’re upgrading from stock NGINX:

sudo dnf -y swap nginx nginx-mod

Then verify HTTP/3 works after reload:

sudo nginx -s reload
curl --http3-only -I https://your-domain.com

This isn’t the first upstream bug we’ve fixed. NGINX-MOD also patches the empty $http_host bug in HTTP/3 that upstream shipped a partial fix for months later.

Subscribe to the GetPageSpeed repository for ongoing access to NGINX-MOD and 1,000+ other RPM packages.

Workaround Without GetPageSpeed NGINX packages

If you can’t switch to NGINX packages by GetPageSpeed, your options are limited:

Option 1: Disable quic_bpf

quic_bpf is a top-level (main context) directive. It lives at the top of nginx.conf, outside any http { } block. Either remove the line (the default is off), or set it off explicitly:

# At the top of nginx.conf, alongside worker_processes (main context):
quic_bpf off;

events {
    # ...
}

http {
    # ...
}

Without quic_bpf, NGINX doesn’t set ls->ignore = 1, so sockets are inherited normally during reload rather than being recreated. The kernel distributes QUIC packets across inherited sockets, and both old and new workers can handle them.

The downside: you lose CID-based BPF routing. Without it, QUIC packets for existing connections may land on the wrong worker after reload, breaking those connections. You’re trading “new connections fail” for “existing connections break.” Not a real solution — just a different flavor of broken.

Option 2: Use nginx -s stop && nginx instead of reload

A full restart avoids the bug entirely since there are no old workers with stale sockets. But you lose all active connections (HTTP/1.1, HTTP/2, and QUIC) during the restart window. Unacceptable for production.

Option 3: Apply the patch yourself

Download the patch and rebuild NGINX from source. If you’re already building from source, this is straightforward. If you’re using distro packages, this is a maintenance burden you probably don’t want.

Who Is Affected

You are affected if all of these are true:

NGINX with QUIC/HTTP3 support (1.25.0+)
quic_bpf on in your config
listen ... quic reuseport in your config
worker_processes > 1
You use nginx -s reload (or systemctl reload nginx)

If you’re running HTTP/3 in production on Linux, you almost certainly have all of these. The quic_bpf on directive is recommended in every QUIC deployment guide, and reuseport is required for multi-worker QUIC.

Wrapping Up

This bug silently drops production HTTP/3 traffic every time you reload NGINX. It’s been known and reported for over a year. Community fixes exist but go unreviewed. F5 has the resources to fix this in an afternoon — they choose not to.

NGINX-MOD exists because the open-source NGINX community deserves a build that actually ships fixes. Get it today.

TLS drift is silent. A working cipher list today, an A- grade tomorrow. GetPageSpeed Amplify runs scheduled gixy scans across every host and ties findings to live NGINX runtime metrics. Drop-in compatible with the deprecated nginx-amplify-agent (EOL January 2026).