Skip to main content

NGINX / Server Setup

NGINX Active Health Checks Without NGINX Plus

by ,


We have by far the largest RPM repository with NGINX module packages and VMODs for Varnish. If you want to install NGINX, Varnish, and lots of useful performance/security software with smooth yum upgrades for production use, this is the repository for you.
Active subscription is required.

NGINX active health checks let you proactively monitor upstream servers by sending periodic probe requests — completely independent of client traffic. Stock NGINX only offers passive health checks, where a backend server must fail real user requests before NGINX marks it as unavailable. By the time NGINX reacts, your users have already experienced errors — failed API calls, broken page loads, timeout screens. Passive checks are reactive. They detect damage, they don’t prevent it.

NGINX Plus solves this with active health checks — periodic probes sent to backend servers independently of client traffic. But NGINX Plus starts at $3,675/year per instance. For teams running multiple NGINX instances, that cost adds up fast.

What if you could get NGINX active health checks without NGINX Plus?

With NGINX-MOD, you can. The NGINX-MOD package includes a built-in active health check module (based on the nginx_upstream_check_module by Weibin Yao) compiled statically into the binary. No load_module directives, no separate packages — NGINX active health checks work out of the box. And NGINX-MOD supports six check types (HTTP, TCP, SSL Hello, MySQL, AJP, FastCGI) compared to the three that NGINX Plus offers.

The math speaks for itself: 10 NGINX Plus instances cost $36,750/year. A GetPageSpeed subscription starts at $20/month per server — and includes every NGINX-MOD enhancement, not just active health checks.

What Are Active Health Checks?

NGINX uses two fundamentally different approaches to determine whether a backend server is healthy:

Passive health checks (stock NGINX) monitor real client requests. When a server fails max_fails times within a fail_timeout window, NGINX marks it as unavailable. The problem: those failures are experienced by real users. If you have max_fails=3, three users get errors before NGINX stops routing traffic to the failing server.

NGINX active health checks send dedicated probe requests to each backend server at regular intervals, completely independent of client traffic. If a server stops responding to probes, NGINX removes it from the rotation before any user request reaches it. When the server recovers and passes enough consecutive checks, NGINX automatically adds it back.

The key differences:

Aspect Passive Checks Active Health Checks
Detection After user requests fail Before users are affected
Traffic required Yes — real requests needed No — probes run independently
Recovery detection Waits for fail_timeout to expire Detects recovery as soon as it happens
Check types HTTP only (whatever the user requests) HTTP, TCP, SSL, MySQL, AJP, FastCGI
Cost (NGINX Plus) Free $3,675+/year
Cost (NGINX-MOD) Free $20/mo per server

Supported Check Types

The NGINX active health checks module in NGINX-MOD supports six protocol-specific check types — more than NGINX Plus:

Type Protocol How It Works Use Case
tcp TCP Connects and peeks one byte Any TCP service (Redis, custom daemons)
http HTTP Sends HTTP request, validates response status Web application backends
ssl_hello TLS Sends ClientHello, expects ServerHello TLS-terminating backends
mysql MySQL Connects, validates MySQL handshake Database servers
ajp AJP Sends CPING, expects CPONG Apache Tomcat / JBoss
fastcgi FastCGI Sends FastCGI request, parses response PHP-FPM pools

Installation

The active health check module is compiled directly into the NGINX-MOD binary. There is no separate module to install — you get NGINX active health checks automatically when you install NGINX-MOD.

RHEL, CentOS, AlmaLinux, Rocky Linux

sudo dnf install https://extras.getpagespeed.com/release-latest.rpm
sudo dnf config-manager --enable getpagespeed-extras-nginx-mod
sudo dnf install nginx-mod

If you already have standard NGINX installed, swap it:

sudo dnf swap nginx nginx-mod

Start NGINX:

sudo systemctl enable --now nginx

Verify the module is included:

nginx -V 2>&1 | grep -o 'nginx_upstream_check_module'

You should see nginx_upstream_check_module in the output. No load_module directive is needed.

Debian and Ubuntu

First, set up the GetPageSpeed APT repository, then install:

sudo apt-get update
sudo apt-get install nginx-mod

Basic HTTP Health Check

Here is a minimal configuration that adds NGINX active health checks to an upstream group:

upstream backend {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;

    check interval=3000 rise=2 fall=5 timeout=1000 type=http;
    check_http_send "GET /health HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx http_3xx;
}

server {
    listen 80;

    location / {
        proxy_pass http://backend;
    }
}

This configuration does the following:

  • check interval=3000 — Sends a probe every 3 seconds (3000 milliseconds).
  • rise=2 — After 2 consecutive successful checks, mark the server as up.
  • fall=5 — After 5 consecutive failed checks, mark the server as down.
  • timeout=1000 — Each probe times out after 1 second.
  • type=http — Use HTTP protocol for the probe.
  • check_http_send — The HTTP request to send as the health probe.
  • check_http_expect_alive — Accept 2xx and 3xx responses as healthy.

Directive Reference

check

Syntax: check interval=milliseconds [fall=count] [rise=count] [timeout=milliseconds] [default_down=true|false] [type=tcp|http|ssl_hello|mysql|ajp|fastcgi] [port=number]
Default: interval=30000 fall=5 rise=2 timeout=1000 default_down=true type=tcp port=0
Context: upstream

The main directive that enables NGINX active health checks for an upstream group. All parameters are optional except that the directive itself must be present.

Parameter Default Description
interval 30000 Probe interval in milliseconds
fall 5 Consecutive failures before marking down
rise 2 Consecutive successes before marking up
timeout 1000 Probe timeout in milliseconds
default_down true Initial server state (true = start as down)
type tcp Check protocol type
port 0 Override port (0 = use upstream server port)

When default_down=true (the default), servers start in a down state and must pass rise consecutive checks before receiving traffic. This is the safer option — it prevents routing traffic to a server that hasn’t been verified yet.

The port parameter lets you probe a different port than the one receiving traffic. This is useful when your health check endpoint runs on a separate port:

upstream backend {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;

    # Check health on port 8888, route traffic to port 8080
    check interval=3000 rise=2 fall=5 timeout=1000 type=http port=8888;
    check_http_send "GET /health HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx;
}

check_http_send

Syntax: check_http_send "http_packet"
Default: "GET / HTTP/1.0\r\n\r\n"
Context: upstream

Specifies the HTTP request to send as the health probe when type=http. The value must be a complete HTTP request including the trailing \r\n\r\n.

For HTTP/1.1, include a Host header:

check_http_send "GET /health HTTP/1.1\r\nHost: backend.example.com\r\n\r\n";

For a POST request with a body:

check_http_send "POST /health HTTP/1.0\r\nContent-Type: application/json\r\nContent-Length: 2\r\n\r\n{}";

check_http_expect_alive

Syntax: check_http_expect_alive http_2xx | http_3xx | http_4xx | http_5xx ...
Default: http_2xx | http_3xx
Context: upstream

Defines which HTTP response status codes indicate a healthy server. You can combine multiple status classes:

# Only 200-299 responses are healthy
check_http_expect_alive http_2xx;

# 200-299 and 300-399 are healthy (default)
check_http_expect_alive http_2xx http_3xx;

# Accept 4xx too (useful if health endpoint returns 401 without auth)
check_http_expect_alive http_2xx http_3xx http_4xx;

check_keepalive_requests

Syntax: check_keepalive_requests number
Default: 1
Context: upstream

Sets the number of health check requests to send over a single connection before closing it. The default value of 1 closes the connection after each check. Increasing this value reduces the overhead of creating new connections for each probe:

upstream backend {
    server 10.0.0.1:8080;

    check interval=3000 rise=2 fall=5 timeout=1000 type=http;
    check_http_send "GET / HTTP/1.1\r\nHost: localhost\r\nConnection: keep-alive\r\n\r\n";
    check_http_expect_alive http_2xx;
    check_keepalive_requests 100;
}

check_fastcgi_param

Syntax: check_fastcgi_param parameter value
Default: REQUEST_METHOD=GET, REQUEST_URI=/, SCRIPT_FILENAME=index.php
Context: upstream

Defines FastCGI parameters when type=fastcgi. Use this to customize the FastCGI request sent to PHP-FPM or other FastCGI servers:

upstream php_fpm {
    server 127.0.0.1:9000;

    check interval=5000 rise=2 fall=3 timeout=3000 type=fastcgi;
    check_fastcgi_param "REQUEST_METHOD" "GET";
    check_fastcgi_param "REQUEST_URI" "/health.php";
    check_fastcgi_param "SCRIPT_FILENAME" "/var/www/html/health.php";
}

check_shm_size

Syntax: check_shm_size size
Default: 1M
Context: http

Sets the size of the shared memory zone used to store health check status across all worker processes. The default of 1 megabyte is sufficient for most setups. Increase this if you are checking hundreds of upstream servers:

http {
    check_shm_size 10M;

    upstream backend {
        # ... servers and check directives
    }
}

check_status

Syntax: check_status [html|csv|json]
Default: none (disabled)
Context: server, location

Enables the health check status dashboard at the specified location. This endpoint displays the current health state of all upstream servers being monitored by NGINX active health checks.

You can set a default format in the directive, or use query parameters to switch formats dynamically:

location /upstream_status {
    check_status;
    allow 127.0.0.1;
    allow 10.0.0.0/8;
    deny all;
}

Access the dashboard with:

  • /upstream_status — HTML format (default)
  • /upstream_status?format=json — JSON format
  • /upstream_status?format=csv — CSV format
  • /upstream_status?status=down — Show only down servers
  • /upstream_status?status=up — Show only up servers

Health Status Dashboard

The check_status endpoint provides real-time visibility into the health of all upstream servers. This is invaluable for monitoring and debugging.

Setting Up the Dashboard

upstream backend {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;

    check interval=3000 rise=2 fall=5 timeout=1000 type=http;
    check_http_send "GET /health HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx;
}

server {
    listen 80;

    location / {
        proxy_pass http://backend;
    }

    location /upstream_status {
        check_status json;
        allow 127.0.0.1;
        allow 10.0.0.0/8;
        deny all;
    }
}

JSON Output

{
  "servers": {
    "total": 2,
    "generation": 1,
    "server": [
      {
        "index": 0,
        "upstream": "backend",
        "name": "10.0.0.1:8080",
        "status": "up",
        "rise": 120,
        "fall": 0,
        "type": "http",
        "port": 0
      },
      {
        "index": 1,
        "upstream": "backend",
        "name": "10.0.0.2:8080",
        "status": "down",
        "rise": 0,
        "fall": 5,
        "type": "http",
        "port": 0
      }
    ]
  }
}

The generation counter increments whenever NGINX reloads its configuration, which helps monitoring tools detect config changes.

The JSON format integrates directly with monitoring tools like Prometheus (via a JSON exporter), Zabbix, Datadog, and custom scripts. You can poll this endpoint to build alerting on backend health state changes.

Real-World Examples

Web Application Backends

The most common use case: monitoring web application servers with a dedicated /health endpoint that validates database connectivity, cache availability, and application readiness:

upstream web_app {
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    server 10.0.0.3:3000;

    check interval=3000 rise=2 fall=3 timeout=2000 type=http;
    check_http_send "GET /health HTTP/1.1\r\nHost: app.example.com\r\n\r\n";
    check_http_expect_alive http_2xx;
}

server {
    listen 80;
    server_name app.example.com;

    location / {
        proxy_pass http://web_app;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    location /upstream_status {
        check_status json;
        allow 127.0.0.1;
        deny all;
    }
}

Your application’s /health endpoint should return HTTP 200 when the application is ready to serve traffic, and a non-2xx status (503, for example) when it is not. The health check endpoint should verify critical dependencies — database connections, cache servers, disk space — not just return 200 unconditionally.

PHP-FPM Pool Monitoring

Use FastCGI NGINX active health checks to monitor PHP-FPM pools directly at the protocol level. Create a minimal health check script that PHP-FPM executes:

<?php
// /var/www/html/health.php - Lightweight health check for FPM pool
echo "OK";

Then configure the upstream with a FastCGI health check pointing to that script:

upstream php_fpm {
    server 127.0.0.1:9000;
    server 127.0.0.1:9001;

    check interval=5000 rise=2 fall=3 timeout=3000 type=fastcgi;
    check_fastcgi_param "REQUEST_METHOD" "GET";
    check_fastcgi_param "REQUEST_URI" "/health.php";
    check_fastcgi_param "SCRIPT_FILENAME" "/var/www/html/health.php";
}

server {
    listen 80;

    location ~ \.php$ {
        fastcgi_pass php_fpm;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;
    }
}

The SCRIPT_FILENAME parameter must point to a real PHP file on the filesystem. The module sends a FastCGI request and validates that the response completes successfully — confirming that PHP-FPM can accept connections and execute scripts.

Database Server Monitoring

Monitor MySQL servers at the protocol level. The MySQL health check validates that the server completes the MySQL handshake protocol, confirming it is accepting connections:

upstream mysql_servers {
    server 10.0.0.1:3306;
    server 10.0.0.2:3306;

    check interval=10000 rise=2 fall=3 timeout=2000 type=mysql;
}

server {
    listen 3307;

    proxy_pass mysql_servers;
}

This is particularly useful with NGINX’s stream module for TCP load balancing of database connections.

Comparison: NGINX-MOD vs NGINX Plus

Feature NGINX Plus NGINX-MOD
Active health checks Yes Yes
HTTP checks Yes Yes
TCP checks Yes Yes
gRPC checks Yes No
SSL/TLS handshake checks No Yes
MySQL protocol checks No Yes
AJP protocol checks No Yes
FastCGI protocol checks No Yes
Health status dashboard Yes Yes
JSON status output Yes Yes
Custom check port Yes Yes
Configurable intervals Yes Yes
Rise/fall thresholds Yes Yes
match block (body/header matching) Yes No
mandatory parameter Yes No
Check types supported 3 6
Price (per instance/year) $3,675 $240

NGINX-MOD supports twice as many check types as NGINX Plus. While NGINX Plus offers match blocks for custom response validation and gRPC health checks, NGINX-MOD covers the protocols that most infrastructure actually uses — including MySQL, FastCGI, and AJP that NGINX Plus does not support at all.

A GetPageSpeed subscription starting at $20/month per server includes every NGINX-MOD enhancement — not just NGINX active health checks.

Security Best Practices

Restrict the Status Dashboard

The check_status endpoint exposes internal infrastructure details (server IPs, ports, health states). Always restrict access:

location /upstream_status {
    check_status json;
    allow 127.0.0.1;
    allow 10.0.0.0/8;     # Internal network only
    deny all;
}

For remote access, serve it over HTTPS with authentication:

server {
    listen 8443 ssl;
    ssl_certificate /etc/nginx/ssl/monitoring.crt;
    ssl_certificate_key /etc/nginx/ssl/monitoring.key;

    location /upstream_status {
        check_status json;
        auth_basic "Health Check Status";
        auth_basic_user_file /etc/nginx/.htpasswd;
        allow 10.0.0.0/8;
        deny all;
    }
}

Use Dedicated Health Endpoints

Don’t use your application’s root URL (/) as the health check target. Create a dedicated endpoint that:

  • Returns a simple, small response (minimizes probe overhead)
  • Validates critical dependencies (database, cache, disk)
  • Does not require authentication
  • Does not trigger application logging that floods your logs

Don’t Expose Sensitive Data

Health check endpoints should return status codes, not detailed error messages. A health probe that returns database connection strings or stack traces in its response body is a security risk — even if the check_status dashboard is restricted, the health endpoint itself might not be.

Troubleshooting

All Servers Marked Down on Startup

This is expected behavior when default_down=true (the default). Servers start as down and must pass rise consecutive checks before receiving traffic. If your backends take time to start, this prevents premature routing.

If you want servers to start as up immediately, set default_down=false:

check interval=3000 rise=2 fall=5 timeout=1000 default_down=false type=http;

“Unknown Directive ‘check'”

You are running standard NGINX instead of NGINX-MOD. The active health check module is only available in NGINX-MOD. Install it:

sudo dnf config-manager --enable getpagespeed-extras-nginx-mod
sudo dnf swap nginx nginx-mod

Shared Memory Too Small

If you have many upstream servers and see errors about shared memory allocation, increase check_shm_size:

http {
    check_shm_size 10M;
    # ...
}

The default is 1 megabyte, which is sufficient for roughly 100 upstream servers. For larger deployments, allocate more.

Timeout Too Aggressive

If backends are healthy but intermittently marked as down, your timeout value may be too low. Backend servers under load might take longer than 1 second (the default) to respond to health probes. Increase the timeout and reduce the fall sensitivity:

# More tolerant settings for loaded backends
check interval=5000 rise=2 fall=5 timeout=3000 type=http;

Health Check Floods Backend Logs

Each NGINX active health check probe generates a log entry on the backend. With interval=1000 (every second) and 10 NGINX instances, that is 10 requests per second per backend — 864,000 log entries per day.

Solutions:
– Increase the interval (3000–5000 ms is sufficient for most cases)
– Configure your backend to suppress logging for the health check endpoint
– Use a dedicated health check port (port=8888) with logging disabled on that port

Conclusion

NGINX active health checks prevent failures instead of reacting to them. With NGINX-MOD, you get this critical capability at a fraction of the NGINX Plus price tag — and with support for six protocol types instead of three.

The module is compiled directly into NGINX-MOD, so there is nothing extra to install or configure beyond adding the check directive to your upstream blocks. Combined with the health status dashboard, you get full visibility into backend health without third-party monitoring tools.

Install NGINX-MOD from the GetPageSpeed repository — your subscription includes every module, every NGINX version, and automatic updates. See the NGINX-MOD page for the complete list of included modules, or explore related features like dynamic upstream management, sticky sessions, and the VTS monitoring module.

D

Danila Vershinin

Founder & Lead Engineer

NGINX configuration and optimizationLinux system administrationWeb performance engineering

10+ years NGINX experience • Maintainer of GetPageSpeed RPM repository • Contributor to open-source NGINX modules

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.