Skip to main content

NGINX / Security

NGINX Sysguard: Automatic Protection Against Server Overload

by ,


We have by far the largest RPM repository with NGINX module packages and VMODs for Varnish. If you want to install NGINX, Varnish, and lots of useful performance/security software with smooth yum upgrades for production use, this is the repository for you.
Active subscription is required.

When your server hits its limits, bad things happen. Response times spike, connections queue up, and eventually the entire system grinds to a halt. The typical scenario: a traffic spike or a misbehaving backend pushes CPU load through the roof, memory fills up, and NGINX keeps accepting requests it cannot serve.

The NGINX sysguard module solves this with automatic overload protection. It monitors system load, memory usage, and request response times in real time. When any metric crosses your defined threshold, sysguard rejects new requests gracefully — returning a 503 status or redirecting to a custom page — instead of letting the server collapse under pressure.

This is not rate limiting, which restricts requests per client. The NGINX sysguard module protects the server itself by watching system-level metrics. Think of it as a circuit breaker for your web server.

How NGINX Sysguard Works

The sysguard module operates in NGINX’s preaccess phase. Before each request reaches your content handler, it checks the current system state against your configured thresholds:

  1. CPU load average — read via getloadavg() or sysinfo() system calls
  2. Memory usage — parsed from /proc/meminfo (swap ratio and free memory)
  3. Request response time — calculated from recent request processing durations

If any threshold is exceeded (in default or mode), the request is immediately redirected to a fallback action — typically a 503 response. In and mode, all configured thresholds must be exceeded before protection triggers.

To minimize overhead, sysguard caches system metrics for a configurable interval (default: 1 second). This means it does not call getloadavg() or read /proc/meminfo on every single request.

Installation

RHEL, CentOS, Rocky Linux, AlmaLinux, Fedora, Amazon Linux

Install the NGINX sysguard module from the GetPageSpeed extras repository:

sudo dnf install https://extras.getpagespeed.com/release-latest.rpm
sudo dnf install nginx-module-sysguard

Then load the module in /etc/nginx/nginx.conf:

load_module modules/ngx_http_sysguard_module.so;

Debian, Ubuntu

Install from the GetPageSpeed APT repository:

sudo apt-get update
sudo apt-get install nginx-module-sysguard

On Debian and Ubuntu, the module is enabled automatically upon installation — no load_module directive is needed.

Configuration Reference

sysguard

Enables or disables the module.

Syntax: sysguard on | off;
Default: off
Context: http, server, location

server {
    sysguard on;
}

sysguard_load

Sets the CPU load average threshold. When the 1-minute load average exceeds this value, sysguard triggers the specified action.

Syntax: sysguard_load load=<number> [action=<uri>];
Default: not set
Context: http, server, location

The load parameter accepts a decimal number or the special ncpu multiplier:

# Fixed threshold
sysguard_load load=10.5 action=/overloaded;

# CPU-aware threshold (scales with core count)
sysguard_load load=ncpu*1.5 action=/overloaded;

The ncpu multiplier is essential for portable configurations. On a 4-core server, ncpu*1.5 translates to a threshold of 6.0. On an 8-core server, it becomes 12.0. This means you can deploy the same configuration across servers with different CPU counts.

If no action is specified, sysguard returns HTTP 503 directly.

sysguard_mem

Monitors memory usage. You can set thresholds for swap usage ratio, free memory, or both.

Syntax: sysguard_mem swapratio=<percent>% [action=<uri>];
Syntax: sysguard_mem free=<size> [action=<uri>];
Default: not set
Context: http, server, location

# Trigger when swap usage exceeds 20%
sysguard_mem swapratio=20% action=/low-memory;

# Trigger when free memory drops below 100MB
sysguard_mem free=100M action=/low-memory;

You can use both directives together. Free memory is calculated as MemFree + Buffers + Cached from /proc/meminfo. This represents memory that the kernel can reclaim when needed — a more accurate measure than raw MemFree alone.

sysguard_rt

Monitors the average request processing time. This directive protects against backend slowdowns that have not yet affected CPU load or memory.

Syntax: sysguard_rt rt=<seconds> period=<time> [method=AMM|WMA[:<number>]] [action=<uri>];
Default: not set
Context: http, server, location

# Trigger when average response time exceeds 500ms over 10 seconds
sysguard_rt rt=0.5 period=10s method=AMM:20 action=/slow-response;

Two averaging methods are available:

  • AMM (Arithmetic Mean Method) — simple average of all requests within the time window. Best for stable traffic patterns.
  • WMA (Weighted Moving Average) — gives more weight to recent requests. Better for detecting sudden slowdowns because older samples have less influence.

The optional number after the method (e.g., AMM:20) sets the sample buffer size. A larger buffer provides smoother averages but responds slower to changes.

sysguard_mode

Controls how multiple thresholds interact.

Syntax: sysguard_mode or | and;
Default: or
Context: http, server, location

# OR mode (default): trigger if ANY threshold is exceeded
sysguard_mode or;

# AND mode: trigger only if ALL thresholds are exceeded
sysguard_mode and;

Use or mode for maximum protection — any single metric breaching its limit triggers the overload response. Use and mode to avoid false positives when you want multiple conditions confirmed before rejecting traffic.

sysguard_interval

Sets how often sysguard refreshes its cached system metrics.

Syntax: sysguard_interval <time>;
Default: 1s
Context: http, server, location

sysguard_interval 5s;

A longer interval reduces system call overhead but makes the module less responsive to rapid changes. For most production servers, 1–5 seconds is appropriate.

sysguard_log_level

Controls the log severity level when protection triggers.

Syntax: sysguard_log_level info | notice | warn | error;
Default: error
Context: http, server, location

sysguard_log_level warn;

When sysguard rejects a request, it logs a message like:

sysguard load limited, current:2.450 conf:1.500

Set this to warn in production to keep overload events visible without flooding your error log.

Practical Configuration Examples

Basic NGINX Sysguard Setup

The simplest useful configuration protects against CPU overload:

server {
    listen 80;
    server_name example.com;

    sysguard on;
    sysguard_load load=ncpu*1.5 action=@guard;
    sysguard_interval 5s;

    location / {
        proxy_pass http://backend;
    }

    location @guard {
        add_header Retry-After 30 always;
        return 503 "Service temporarily unavailable. Please retry after 30 seconds.";
    }
}

The Retry-After header tells well-behaved clients (and search engine crawlers) when to try again. This prevents them from hammering your server during an overload event.

Comprehensive Production Setup

For production servers, monitor all three metrics and use a custom log format to track system health:

log_format sysguard_log '$remote_addr [$time_local] "$request" $status '
    'load=$sysguard_load free=$sysguard_free '
    'swap=$sysguard_swapstat rt=$sysguard_rt';

server {
    listen 80;
    server_name example.com;

    access_log /var/log/nginx/sysguard.log sysguard_log;

    sysguard on;
    sysguard_mode or;
    sysguard_load load=ncpu*1.5 action=@overloaded;
    sysguard_mem swapratio=30% action=@overloaded;
    sysguard_mem free=100M action=@overloaded;
    sysguard_rt rt=2.0 period=30s method=WMA:50 action=@overloaded;
    sysguard_interval 5s;
    sysguard_log_level warn;

    location / {
        proxy_pass http://backend;
    }

    location @overloaded {
        add_header Retry-After 30 always;
        return 503 "Service temporarily unavailable. Please retry after 30 seconds.";
    }
}

This NGINX sysguard configuration triggers overload protection when any of these conditions occurs:

  • CPU load exceeds 1.5x the number of cores
  • Swap usage exceeds 30%
  • Free memory drops below 100 MB
  • Average response time exceeds 2 seconds (weighted moving average over 30s)

Per-Location Thresholds

Different parts of your application may have different tolerances. API endpoints typically need stricter protection than static content:

server {
    listen 80;
    server_name example.com;

    # Global: generous threshold for static content
    sysguard on;
    sysguard_load load=ncpu*2 action=@guard;

    location / {
        root /var/www/html;
        index index.html;
    }

    # Stricter thresholds for API endpoints
    location /api/ {
        sysguard on;
        sysguard_load load=ncpu*1.0 action=@guard;
        sysguard_rt rt=1.0 period=15s method=AMM action=@guard;
        proxy_pass http://api_backend;
    }

    location @guard {
        default_type application/json;
        return 503 '{"error": "service_unavailable", "retry_after": 30}';
    }
}

This protects your API at a lower load threshold (1x core count) than the rest of the site (2x core count). Additionally, it adds response time monitoring specifically for API requests.

AND Mode: Avoiding False Positives

Sometimes a single metric can spike temporarily without meaning the server is truly overloaded. AND mode requires all conditions to be met:

server {
    listen 80;
    server_name example.com;

    sysguard on;
    sysguard_mode and;
    sysguard_load load=ncpu*2 action=@guard;
    sysguard_mem free=50M action=@guard;
    sysguard_interval 2s;

    location / {
        proxy_pass http://backend;
    }

    location @guard {
        return 503;
    }
}

Protection only triggers when both load exceeds 2x core count and free memory drops below 50 MB. A temporary load spike alone will not reject requests.

Exported Variables

The sysguard module exports variables you can use in logging, headers, and conditional logic. These variables are only populated when their corresponding directives are configured.

Variable Description Units Requires
$sysguard_load Current 1-minute load average Thousandths (219 = 0.219) sysguard_load
$sysguard_free Available memory (Free + Buffers + Cached) Bytes sysguard_mem
$sysguard_swapstat Swap usage percentage 0–100 sysguard_mem
$sysguard_rt Average request processing time Milliseconds sysguard_rt
$sysguard_meminfo_totalram Total physical memory Bytes sysguard_mem
$sysguard_meminfo_freeram Free physical memory Bytes sysguard_mem
$sysguard_meminfo_bufferram Buffer memory Bytes sysguard_mem
$sysguard_meminfo_cachedram Cached memory Bytes sysguard_mem
$sysguard_meminfo_totalswap Total swap space Bytes sysguard_mem
$sysguard_meminfo_freeswap Free swap space Bytes sysguard_mem

Important: The $sysguard_load value is in thousandths. A value of 1500 means a load average of 1.5. Memory variables report values in bytes.

Health Monitoring Endpoint

Use the exported variables to build a lightweight health check endpoint for your NGINX sysguard setup:

server {
    listen 80;
    server_name example.com;

    sysguard on;
    sysguard_load load=ncpu*1.5 action=@guard;
    sysguard_mem free=100M action=@guard;
    sysguard_interval 5s;

    location / {
        proxy_pass http://backend;
    }

    location /health {
        default_type application/json;
        return 200 '{"load": "$sysguard_load", "free_memory": "$sysguard_free", "swap": "$sysguard_swapstat"}';
    }

    location @guard {
        return 503;
    }
}

This /health endpoint can be polled by load balancers or monitoring tools like the NGINX VTS module to observe system metrics without additional monitoring agents.

Custom Log Format for Monitoring

Track sysguard metrics alongside standard access logs:

log_format sysguard_log '$remote_addr - $status '
    'load=$sysguard_load free=$sysguard_free '
    'swap=$sysguard_swapstat rt=$sysguard_rt';

server {
    listen 80;
    server_name example.com;

    access_log /var/log/nginx/sysguard.log sysguard_log;

    sysguard on;
    sysguard_load load=ncpu*1.5 action=@guard;
    sysguard_mem free=100M action=@guard;
    sysguard_interval 5s;

    location / {
        proxy_pass http://backend;
    }

    location @guard {
        return 503;
    }
}

Sample log output:

192.168.1.100 - 200 load=219 free=2402320384 swap=0 rt=0
192.168.1.101 - 503 load=4500 free=52428800 swap=45 rt=1250

In this example, the second request was rejected because the load (4.5) exceeded the threshold on a 2-core machine (ncpu*1.5 = 3.0).

NGINX Sysguard vs. Rate Limiting

The sysguard module and rate limiting serve different purposes and work best together:

Feature Sysguard Rate Limiting
Protects The server itself Per-client fairness
Monitors CPU, memory, response time Request count per client
Triggers System-level thresholds Per-IP request rate
Use case Prevent server collapse Prevent abuse from individual clients

Use rate limiting to prevent individual clients from consuming too many resources. Use the sysguard module as a safety net that catches any overload condition — regardless of its source.

Performance Considerations

The sysguard module adds minimal overhead to request processing:

  • System calls are cached. The getloadavg() and /proc/meminfo reads happen once per sysguard_interval, not per request. With the default 1-second interval and 10,000 requests per second, only 1 in 10,000 requests triggers a system call.

  • Decision logic is simple. The preaccess handler performs integer comparisons against cached values — negligible CPU cost.

  • Response time tracking uses a ring buffer. The circular buffer for RT calculations has O(1) insert and fixed memory usage regardless of traffic volume.

  • For high-traffic servers, increase sysguard_interval to 5s or 10s to further reduce system call frequency. The trade-off is slower response to sudden load changes.

    Troubleshooting

    Module Does Not Trigger

    1. Verify the module is loaded:
    nginx -V 2>&1 | grep sysguard
    # Or check loaded modules:
    nginx -T 2>&1 | grep sysguard
    
    1. Confirm sysguard on is set in the correct context (server or location).

    2. Check the configured threshold against actual system load:

    # Current load average
    uptime
    
    # Memory status
    free -m
    
    1. Decrease sysguard_interval to ensure metrics are fresh.

    Memory Variables Show Zero

    Memory-related variables ($sysguard_free, $sysguard_swapstat, etc.) are only populated when sysguard_mem is configured. If you only use sysguard_load, the memory variables will return empty or zero values.

    Similarly, $sysguard_load requires sysguard_load to be configured, and $sysguard_rt requires sysguard_rt.

    Understanding the Load Value

    The $sysguard_load variable reports load in thousandths. To convert to the familiar decimal format:

    • $sysguard_load = 1500 → actual load = 1.5
    • $sysguard_load = 219 → actual load = 0.219

    The error log, however, uses decimal format:

    sysguard load limited, current:2.450 conf:1.500
    

    Overload Protection Triggers Too Often

    If sysguard is rejecting too many requests:

    • Increase the load threshold. A value of ncpu*1.5 is a good starting point, but some workloads tolerate higher load.
    • Switch from or to and mode so that multiple conditions must be met.
    • Increase sysguard_interval to smooth out transient spikes.
    • Consider using WMA instead of AMM for response time — it adapts faster to recovery.

    Security Best Practices

    The NGINX sysguard module is a defense-in-depth layer. Combine it with other security measures for comprehensive server protection:

    • Rate limiting prevents individual clients from causing overload
    • ModSecurity WAF blocks malicious requests before they consume resources
    • Connection limits (limit_conn) cap concurrent connections per client
    • Caching reduces backend load by serving cached responses

    The sysguard module acts as the last line of defense — when everything else fails to prevent overload, it ensures the server degrades gracefully rather than crashing.

    Conclusion

    The NGINX sysguard module provides automatic, system-aware overload protection that keeps your server responsive under pressure. Instead of letting traffic spikes bring down your entire application, sysguard rejects excess requests gracefully with proper HTTP 503 responses and Retry-After headers.

    Key takeaways for configuring nginx sysguard:

    • Use ncpu* multipliers for portable load thresholds across different servers
    • Monitor all three metrics (load, memory, response time) for comprehensive protection
    • Start with or mode and tune thresholds before switching to and mode
    • Use the exported variables in log formats and health endpoints for observability
    • Combine sysguard with rate limiting and caching for defense in depth

    The module is available from the GetPageSpeed repository for RHEL-based distributions and from the APT repository for Debian/Ubuntu. The source code is on GitHub.

    D

    Danila Vershinin

    Founder & Lead Engineer

    NGINX configuration and optimizationLinux system administrationWeb performance engineering

    10+ years NGINX experience • Maintainer of GetPageSpeed RPM repository • Contributor to open-source NGINX modules

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes:

    <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

    This site uses Akismet to reduce spam. Learn how your comment data is processed.