NGINX / Security

NGINX Rate Limiting: The Complete Guide

by , , revisited on


We have by far the largest RPM repository with NGINX module packages and VMODs for Varnish. If you want to install NGINX, Varnish, and lots of useful performance/security software with smooth yum upgrades for production use, this is the repository for you.
Active subscription is required.

If you’ve experienced a surge of requests overwhelming your server, you know how critical rate limiting is. Whether it’s a DDoS attack, a misbehaving crawler, or too many users hitting your API at once, NGINX rate limiting offers an elegant solution using the leaky bucket algorithm.

In this guide, you’ll learn how NGINX rate limiting works, how to configure it properly, and common mistakes to avoid.

What is Rate Limiting?

Rate limiting controls how many requests a client can make within a time period. Think of it as a bouncer who only lets in a certain number of people per minute.

NGINX implements rate limiting using the leaky bucket algorithm. Imagine a bucket with a small hole at the bottom:

  • Water (requests) flows in at varying rates
  • Water drains out at a constant rate (your configured limit)
  • If the bucket overflows, excess water is discarded (requests rejected)

This approach smooths traffic bursts while maintaining steady flow to your backend.

Why Use NGINX Rate Limiting?

Rate limiting serves multiple purposes:

  1. DDoS Protection: Mitigate denial-of-service attacks by limiting rates per IP
  2. API Protection: Prevent abuse of your API endpoints
  3. Resource Conservation: Protect backend servers from overload
  4. Fair Usage: Ensure no single client monopolizes resources
  5. Bot Mitigation: Slow down aggressive crawlers and scrapers
  6. Login Protection: Prevent brute-force password attacks

For advanced bot protection, consider implementing a honeypot system alongside rate limiting.

Basic Rate Limiting Configuration

Step 1: Define the Rate Limit Zone

First, define a shared memory zone in the http block:

http {
    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;
}

Let’s break down each component:

  • $binary_remote_addr: The key for tracking (client IP in binary format)
  • zone=mylimit:10m: Creates a 10 megabyte shared memory zone
  • rate=10r/s: Allows 10 requests per second

Memory tip: 1 megabyte stores about 16,000 IP addresses. For most sites, 10m is sufficient.

Step 2: Apply the Rate Limit

Apply the rate limit in a location or server block:

server {
    location /api/ {
        limit_req zone=mylimit;

        proxy_pass http://backend;
    }
}

Requests exceeding 10 per second from the same IP receive 503 Service Unavailable.

Understanding the Burst Parameter

The basic limit above is quite strict. If a client sends 2 requests at once, the second gets rejected. The leaky bucket only allows 1 request every 100ms (1/10th of a second).

The burst parameter provides a buffer for traffic spikes:

location /api/ {
    limit_req zone=mylimit burst=20;

    proxy_pass http://backend;
}

With burst=20:

  • 1 request processes immediately (the rate allowance)
  • Up to 20 additional requests queue in the burst bucket
  • Queued requests release at the rate limit (one every 100ms)
  • Requests beyond burst are rejected

Important: Burst requests are delayed, not processed immediately. A burst of 20 requests takes 2 seconds to fully process.

The nodelay Parameter

To process burst requests immediately without queuing, use nodelay:

location /api/ {
    limit_req zone=mylimit burst=20 nodelay;

    proxy_pass http://backend;
}

With burst=20 nodelay:

  • Up to 21 requests (1 rate + 20 burst) process immediately
  • Burst slots refill at the rate limit (one slot every 100ms)
  • Once slots exhaust, excess requests are rejected
  • No requests are delayed

When to use nodelay: When response time matters more than strict smoothing. Perfect for APIs needing immediate responses.

The delay Parameter (Hybrid Approach)

NGINX 1.15.7 introduced the delay parameter for a middle ground:

location /api/ {
    limit_req zone=mylimit burst=20 delay=10;

    proxy_pass http://backend;
}

With burst=20 delay=10:

  • First 11 requests (1 rate + 10 delay) process immediately
  • Next 10 requests (remaining burst) are delayed
  • Requests beyond burst=20 are rejected

This allows some bursts while still smoothing larger spikes.

Rate Limiting Response Codes

By default, rate-limited requests receive HTTP 503. For APIs, 429 (Too Many Requests) is more appropriate:

location /api/ {
    limit_req zone=mylimit burst=20 nodelay;
    limit_req_status 429;

    proxy_pass http://backend;
}

You can also customize the error page:

error_page 429 /rate_limited.html;

location = /rate_limited.html {
    internal;
    return 429 '{"error": "Rate limit exceeded. Please slow down."}';
    add_header Content-Type application/json;
}

Logging Rate-Limited Requests

Control how rate-limited requests are logged:

location /api/ {
    limit_req zone=mylimit burst=20 nodelay;
    limit_req_log_level warn;

    proxy_pass http://backend;
}

Available log levels: info, notice, warn (recommended), error (default).

The log entries show useful information:

2026/01/22 12:34:56 [warn] limiting requests, excess: 20.500 by zone "mylimit", client: 192.168.1.100

The excess value shows how much over the limit the client is.

Multiple Rate Limit Zones

Apply multiple rate limits for layered protection:

http {
    # Per-IP rate limit
    limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;

    # Per-server rate limit (global)
    limit_req_zone $server_name zone=perserver:10m rate=1000r/s;

    server {
        location /api/ {
            limit_req zone=perip burst=20 nodelay;
            limit_req zone=perserver burst=100 nodelay;

            proxy_pass http://backend;
        }
    }
}

Both limits must be satisfied. A single IP cannot exceed 10r/s, AND the server cannot exceed 1000r/s combined.

Per-URI Rate Limiting

Limit specific endpoints more aggressively:

http {
    limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /login {
            limit_req zone=login burst=5 nodelay;
            limit_req_status 429;
        }

        location /api/ {
            limit_req zone=api burst=20 nodelay;
            limit_req_status 429;
        }
    }
}

Rate Limit by Request URI

Rate limit specific URIs regardless of client IP:

limit_req_zone $request_uri zone=peruri:10m rate=100r/s;

location /expensive-endpoint {
    limit_req zone=peruri burst=50 nodelay;
}

This ensures no single endpoint receives more than 100 requests per second.

Whitelisting Trusted IPs

Exempt trusted IPs from rate limiting:

http {
    geo $rate_limit_exempt {
        default 1;
        127.0.0.1 0;
        10.0.0.0/8 0;
        192.168.0.0/16 0;
    }

    map $rate_limit_exempt $rate_limit_key {
        0 "";
        1 $binary_remote_addr;
    }

    limit_req_zone $rate_limit_key zone=mylimit:10m rate=10r/s;
}

When $rate_limit_key is empty, the rate limit doesn’t apply.

Rate Limiting Behind a Load Balancer

Behind a load balancer or CDN, $binary_remote_addr shows the load balancer’s IP. Use the real IP module:

http {
    set_real_ip_from 10.0.0.0/8;
    real_ip_header X-Forwarded-For;

    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;
}

With Cloudflare:

http {
    set_real_ip_from 173.245.48.0/20;
    set_real_ip_from 103.21.244.0/22;
    set_real_ip_from 103.22.200.0/22;
    # ... add all Cloudflare IP ranges
    real_ip_header CF-Connecting-IP;

    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;
}

Per-Minute Rate Limiting

For less aggressive limits, use requests per minute:

limit_req_zone $binary_remote_addr zone=perminute:10m rate=30r/m;

This allows 30 requests per minute (1 every 2 seconds). Useful for expensive operations like password resets.

Complete Production Configuration

Here’s a comprehensive production-ready configuration:

http {
    # Rate limit zones
    limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=api:10m rate=30r/s;
    limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
    limit_req_zone $binary_remote_addr zone=static:10m rate=50r/s;

    # Whitelist trusted IPs
    geo $rate_limit_exempt {
        default 1;
        127.0.0.1 0;
        10.0.0.0/8 0;
    }

    map $rate_limit_exempt $rate_limit_key {
        0 "";
        1 $binary_remote_addr;
    }

    # Recreate zones with whitelist-aware key
    limit_req_zone $rate_limit_key zone=general_wl:10m rate=10r/s;

    server {
        listen 80;
        server_name example.com;

        # General rate limit for most requests
        limit_req zone=general burst=20 nodelay;
        limit_req_status 429;
        limit_req_log_level warn;

        # Login endpoint - very strict
        location /login {
            limit_req zone=login burst=5;
            limit_req_status 429;

            proxy_pass http://backend;
        }

        # API endpoints - more generous
        location /api/ {
            limit_req zone=api burst=50 nodelay;
            limit_req_status 429;

            proxy_pass http://api_backend;
        }

        # Static files - liberal rate limit
        location /static/ {
            limit_req zone=static burst=100 nodelay;

            root /var/www;
        }

        # Health check - no rate limit
        location /health {
            limit_req off;
            return 200 'OK';
        }
    }
}

How the Leaky Bucket Algorithm Works

Understanding the internals helps configure rate limiting. NGINX stores two values per tracked key:

  • excess: Current water level in the bucket (in milli-request units)
  • last: Timestamp of the last request

When a request arrives:

  1. Calculate time elapsed since last request
  2. Drain the bucket: excess = excess - (rate × elapsed_time)
  3. Add the new request: excess = excess + 1
  4. If excess > burst: reject the request
  5. If excess > delay: delay the request
  6. Otherwise: process immediately

This algorithm requires only two values per key, making it memory-efficient for millions of IPs.

Common Mistakes to Avoid

Mistake 1: Forgetting the burst parameter

# Too strict - rejects all but 1 request per 100ms
limit_req zone=mylimit;

# Better - allows reasonable bursts
limit_req zone=mylimit burst=20 nodelay;

Mistake 2: Undersized memory zones

# Too small for high-traffic sites
limit_req_zone $binary_remote_addr zone=small:1m rate=10r/s;

# Adequate for most use cases
limit_req_zone $binary_remote_addr zone=adequate:10m rate=10r/s;

Mistake 3: Not using 429 for APIs

# Default 503 confuses API clients
limit_req zone=api;

# Proper HTTP semantics
limit_req zone=api;
limit_req_status 429;

Mistake 4: Rate limiting health checks

location /health {
    limit_req off;  # Important for monitoring systems
    return 200 'OK';
}

Testing Your Rate Limiting Configuration

After configuring, test with a load testing tool:

# Send 10 rapid requests
for i in {1..10}; do
    curl -s -o /dev/null -w "%{http_code}n" http://localhost/api/test &
done
wait

For thorough testing, use Apache Benchmark:

ab -n 100 -c 50 http://localhost/api/test

Check the error log for rate limiting entries:

tail -f /var/log/nginx/error.log | grep "limiting"

Automatic IP Blocking with Fail2ban

While NGINX rate limiting rejects excess requests, persistent abusers keep consuming resources. Fail2ban can automatically block repeat offenders at the firewall level.

Fail2ban includes a built-in nginx-limit-req filter that parses NGINX error logs for rate limiting messages. Enable it in /etc/fail2ban/jail.local:

[nginx-limit-req]
enabled = true
filter  = nginx-limit-req
logpath = /var/log/nginx/error.log
findtime = 600
maxretry = 10
bantime = 24h

This configuration:

  • Monitors the NGINX error log for rate limit violations
  • Bans IPs that trigger rate limiting 10 times within 10 minutes
  • Blocks them at the firewall for 24 hours

The nginx-limit-req filter matches log entries like:

limiting requests, excess: 20.500 by zone "mylimit", client: 192.168.1.100

For this to work effectively, ensure your NGINX rate limiting uses limit_req_log_level warn or lower to generate log entries.

Install fail2ban on RHEL-based systems:

dnf install epel-release
dnf install fail2ban
systemctl enable --now fail2ban

Check banned IPs:

fail2ban-client status nginx-limit-req

This creates a powerful two-tier defense: NGINX handles the rate limiting logic while fail2ban escalates persistent abusers to firewall-level blocks.

Advanced Rate Limiting Modules

The built-in limit_req module covers most use cases, but the GetPageSpeed repository provides additional NGINX modules for specialized scenarios:

Distributed Rate Limiting with Redis

For load-balanced environments where rate limits must be shared across multiple NGINX instances, use nginx-module-redis-rate-limit:

dnf install nginx-module-redis-rate-limit

This module uses the Generic Cell Rate Algorithm (GCRA) backed by Redis:

load_module modules/ngx_http_rate_limit_module.so;

upstream redis {
    server 127.0.0.1:6379;
    keepalive 1024;
}

location /api/ {
    rate_limit $remote_addr requests=15 period=1m burst=20;
    rate_limit_pass redis;
}

Dynamic IP Blocking

The nginx-module-dynamic-limit-req extends the standard rate limiter to dynamically block abusive IPs:

dnf install nginx-module-dynamic-limit-req

This module can work with RedisPushIptables to automatically push blocking rules to iptables for network-layer protection.

Bandwidth Rate Limiting

To limit download speed (not request rate), use nginx-module-limit-traffic-rate:

dnf install nginx-module-limit-traffic-rate
load_module modules/ngx_http_limit_traffic_rate_filter_module.so;

http {
    limit_traffic_rate_zone rate $remote_addr 32m;

    location /downloads/ {
        limit_traffic_rate rate 100k;  # 100 KB/s per IP
    }
}

Unlike limit_rate which limits per-connection speed, this limits the total bandwidth per IP across all connections.

System Load Protection

The nginx-module-sysguard module automatically protects your server when system load, memory, or response times exceed thresholds:

dnf install nginx-module-sysguard
load_module modules/ngx_http_sysguard_module.so;

server {
    sysguard on;
    sysguard_load load=10.5 action=/loadlimit;
    sysguard_mem free=100M action=/memlimit;
    sysguard_rt rt=0.5 period=5s action=/slowlimit;

    location /loadlimit { return 503; }
    location /memlimit { return 503; }
    location /slowlimit { return 503; }
}

For sophisticated bot protection, nginx-module-testcookie uses JavaScript cookie challenges:

dnf install nginx-module-testcookie

This module issues encrypted cookie challenges that require JavaScript execution – blocking most automated tools while allowing real browsers.

Lua-Based Rate Limiting

For maximum flexibility, lua-resty-limit-traffic provides Lua-based rate limiting that works anywhere in the request lifecycle:

dnf install lua5.1-resty-limit-traffic
access_by_lua_block {
    local limit_req = require "resty.limit.req"
    local lim = limit_req.new("my_limit_store", 200, 100)

    local delay, err = lim:incoming(ngx.var.binary_remote_addr, true)
    if not delay and err == "rejected" then
        return ngx.exit(429)
    end

    if delay >= 0.001 then
        ngx.sleep(delay)
    end
}

This provides features like combining multiple limiters, custom keys, and rate limiting at SSL handshake time.

Security Beyond Rate Limiting

Rate limiting is one layer of defense. For comprehensive NGINX security analysis, use gixy – a static analyzer that detects misconfigurations:

  • Server-Side Request Forgery (SSRF) vulnerabilities
  • HTTP splitting attacks
  • Path traversal issues
  • Weak SSL/TLS configurations
  • Missing security headers

Install gixy on RHEL-based systems:

dnf install https://extras.getpagespeed.com/release-latest.rpm
dnf install gixy

Run analysis:

gixy /etc/nginx/nginx.conf

For additional web application protection, consider deploying ModSecurity with OWASP rules.

Performance Considerations

Rate limiting has minimal performance impact:

  • Shared memory zones are stored in RAM with O(1) access time
  • The leaky bucket calculation is simple arithmetic
  • Red-black tree lookup is O(log n) for finding tracked keys

For high-traffic sites, consider:

  1. Larger memory zones to prevent premature expiration
  2. Rate limiting at the edge (CDN level) for better scalability
  3. Connection limiting (limit_conn) as a complement

For optimal NGINX performance, also ensure proper worker process tuning.

Conclusion

NGINX rate limiting protects servers from abuse. By understanding the leaky bucket algorithm and configuring zones, burst parameters, and response codes properly, you build resilient applications.

Key takeaways:

  1. Always include a burst parameter for realistic rate limiting
  2. Use nodelay when response time matters
  3. Return 429 status for API endpoints
  4. Whitelist trusted IPs for monitoring and internal services
  5. Layer multiple rate limits for defense in depth
  6. Use fail2ban to automatically block persistent abusers
  7. Test thoroughly before deploying to production
  8. Consider advanced modules for distributed or specialized rate limiting

Rate limiting works best as part of a comprehensive security strategy. Combine it with firewall rules, fail2ban, and security analyzers like gixy for complete protection.

D

Danila Vershinin

Founder & Lead Engineer

NGINX configuration and optimizationLinux system administrationWeb performance engineering

10+ years NGINX experience • Maintainer of GetPageSpeed RPM repository • Contributor to open-source NGINX modules

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.