NGINX Bot Verification: Block Fake Crawlers

by Danila Vershinin, January 31, 2026 , revisited on February 16, 2026

We have by far the largest RPM repository with NGINX module packages and VMODs for Varnish. If you want to install NGINX, Varnish, and lots of useful performance/security software with smooth yum upgrades for production use, this is the repository for you.
Active subscription is required.

Many website owners allow search engine bots to bypass security measures. They do this to ensure proper indexing and maintain good SEO rankings. However, this creates a significant vulnerability. Malicious actors frequently impersonate legitimate crawlers like Googlebot to scrape content, launch attacks, or bypass rate limits.

The NGINX bot verification module solves this problem by validating whether visitors claiming to be search engine bots are genuine. It uses the same reverse DNS verification method that Google, Microsoft, and other search engines officially recommend. In this comprehensive guide, you will learn how to install, configure, and test this essential security module.

Why You Need NGINX Bot Verification

Search engine crawlers identify themselves through the User-Agent header. For example, Googlebot sends a header like this:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

The problem is simple. Anyone can set this header. A malicious script can easily claim to be Googlebot. Therefore, many websites unknowingly grant special access to attackers who spoof these headers.

Consider these common scenarios where fake bots cause problems:

Content scraping: Competitors steal your content by pretending to be search crawlers
DDoS attacks: Attackers bypass rate limits by using bot user-agents
Vulnerability scanning: Hackers avoid security tools by impersonating crawlers
Click fraud: Bots fake their identity to manipulate analytics data

The solution is reverse DNS verification. This technique confirms that the requesting IP address actually belongs to the claimed search engine. Google, Microsoft, and other search providers officially document this verification method.

How NGINX Bot Verification Works

The module operates in the access phase of request processing. When a request arrives, it follows this verification process:

User-Agent Detection: The module checks if the User-Agent header contains known bot identifiers (Google, Bing, Yahoo, Baidu, or Yandex)
Reverse DNS Lookup: If a bot is detected, the module performs a reverse DNS lookup on the client IP address
Domain Validation: The resulting hostname must match approved domains for that search engine
Forward DNS Verification: The module confirms the hostname resolves back to the original IP
Result Caching: Valid and invalid results get cached in Redis to prevent repeated lookups

This NGINX bot verification approach is effective because search engines control their DNS records. An attacker cannot spoof the reverse DNS of IP addresses they do not own. Moreover, the caching mechanism ensures minimal performance impact on your server.

Important: The module only verifies requests that claim to be bots. Normal browser requests pass through without any verification, even if a previous bot request from the same IP failed verification.

Supported Search Engines

The module validates bots from these major search engines:

Search Engine	Verified Domains
Google	googlebot.com, google.com
Bing	search.msn.com
Yahoo	yahoo.com
Baidu	crawl.baidu.com
Yandex	yandex.com, yandex.net, yandex.ru

When a request fails verification, the module returns a 403 Forbidden response. This blocks fake crawlers while allowing legitimate search engine bots to access your content normally.

Installation on Rocky Linux, AlmaLinux, and RHEL

Installing the NGINX bot verification module requires the GetPageSpeed repository. This repository provides pre-built packages for all major RHEL-based distributions. Follow these steps to install the module.

First, install the GetPageSpeed repository:

dnf -y install https://extras.getpagespeed.com/release-latest.rpm

Next, install the bot verifier module along with Redis (or KeyDB) for caching:

dnf -y install nginx-module-bot-verifier keydb

KeyDB is a high-performance Redis alternative that works identically for this purpose. You can also use the standard Redis server if you prefer.

Start and enable the caching service:

systemctl enable --now keydb

Finally, load the module in your NGINX configuration. Add this line at the very top of /etc/nginx/nginx.conf, before the events block:

load_module modules/ngx_http_bot_verifier_module.so;

Test and reload your configuration:

nginx -t && systemctl reload nginx

SELinux Configuration

On systems with SELinux enabled, NGINX needs permission to connect to Redis. Run this command to allow network connections:

setsebool -P httpd_can_network_connect 1

Without this setting, the NGINX bot verification module will bypass verification and log connection errors. Therefore, this step is essential for proper functionality.

Configuration Guide

The bot verifier module uses simple directives within location blocks. Here is a complete configuration example:

server {
    listen 80;
    server_name example.com;

    resolver 1.1.1.1 8.8.8.8;

    location / {
        bot_verifier on;
        bot_verifier_redis_host 127.0.0.1;
        bot_verifier_redis_port 6379;
        bot_verifier_redis_connection_timeout 10;
        bot_verifier_redis_read_timeout 10;
        bot_verifier_redis_expiry 3600;

        # Your existing location configuration
        try_files $uri $uri/ =404;
    }
}

Configuration Directives Explained

The module provides several directives for fine-tuning its behavior:

bot_verifier (on|off)

This directive enables or disables bot verification for the location. The default value is off. Set it to on to activate the module.

bot_verifier_redis_host

Specifies the Redis or KeyDB server hostname. The default is localhost. Use the IP address 127.0.0.1 or the hostname of your caching server.

bot_verifier_redis_port

Sets the Redis port number. The default is 6379, which is the standard Redis port. Change this only if your Redis server uses a non-standard port.

bot_verifier_redis_connection_timeout

Defines the connection timeout in seconds. The default is 10 seconds. Lower values provide faster failure detection but may cause issues with slow networks.

bot_verifier_redis_read_timeout

Sets the timeout for Redis read operations. The default is 10 seconds. This affects how long the module waits for cached results.

bot_verifier_redis_expiry

Controls how long verification results stay cached. The default is 3600 seconds (one hour). Longer values reduce DNS lookups but delay detection of IP changes.

DNS Resolver Configuration

The NGINX resolver directive is required for the bot verification module to work. Since the module performs reverse and forward DNS lookups to verify bot identity, NGINX needs an explicitly configured DNS resolver for these operations.

Without a resolver, the module logs this error and cannot verify any bots:

bot_verifier: resolver directive is required but not configured

Add the resolver directive in the http, server, or location block where bot_verifier is enabled:

resolver 1.1.1.1 8.8.8.8;

You can also use a local caching DNS resolver like unbound or dnsmasq for faster lookups:

resolver 127.0.0.1;

IPv6 DNS addresses are supported:

resolver 1.1.1.1 [2606:4700:4700::1111];

Specifying multiple resolvers provides redundancy. This is a common setup issue — without a configured resolver, the module silently fails to verify bots and logs the error above for every request.

Reverse Proxy Configuration

If NGINX sits behind a load balancer, CDN (like Cloudflare), or another reverse proxy, you must configure the realip module. Without this configuration, all requests appear to come from the proxy’s IP address, and bot verification will fail for legitimate crawlers.

Add the realip configuration in the http block:

http {
    # Trust your reverse proxy/CDN to set X-Forwarded-For
    set_real_ip_from 10.0.0.0/8;
    set_real_ip_from 172.16.0.0/12;
    set_real_ip_from 192.168.0.0/16;

    # For Cloudflare, add their IP ranges:
    # set_real_ip_from 103.21.244.0/22;
    # set_real_ip_from 103.22.200.0/22;
    # ... (see Cloudflare's published IP ranges)

    real_ip_header X-Forwarded-For;
    real_ip_recursive on;

    # ... rest of configuration
}

The real_ip_recursive on setting is important when multiple proxies are involved. It extracts the real client IP from the last untrusted address in the X-Forwarded-For chain.

Production Configuration Example

For production environments, consider this enhanced NGINX bot verification configuration:

# Main context - load the module
load_module modules/ngx_http_bot_verifier_module.so;

http {
    # Realip configuration for reverse proxy setups
    set_real_ip_from 10.0.0.0/8;
    set_real_ip_from 172.16.0.0/12;
    set_real_ip_from 192.168.0.0/16;
    real_ip_header X-Forwarded-For;
    real_ip_recursive on;

    # ... other settings ...

    server {
        listen 443 ssl http2;
        server_name example.com;

        resolver 1.1.1.1 8.8.8.8;

        # Protect all dynamic content
        location / {
            bot_verifier on;
            bot_verifier_redis_host 127.0.0.1;
            bot_verifier_redis_port 6379;
            bot_verifier_redis_expiry 7200;

            proxy_pass http://backend;
        }

        # Static files do not need verification
        location /static/ {
            bot_verifier off;
            alias /var/www/static/;
        }
    }
}

This configuration applies verification only to dynamic content. Static files skip verification because scraping static assets poses less risk. Additionally, the cache expiry is set to 7200 seconds (two hours) to further reduce DNS lookups.

Testing Your NGINX Bot Verification Setup

After installing and configuring the module, you should verify it works correctly. Use these curl commands to test different scenarios.

First, test a normal request without any bot user-agent:

curl -s -o /dev/null -w '%{http_code}\n' http://localhost/

This should return 200, indicating the request passed normally.

Next, test a fake Googlebot request from your local machine:

curl -s -o /dev/null -w '%{http_code}\n' \
    -A 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' \
    http://localhost/

This should return 403 because your local IP does not belong to Google. The module correctly identified the fake bot.

You can also test other bot user-agents:

# Test fake Bingbot
curl -s -o /dev/null -w '%{http_code}\n' \
    -A 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)' \
    http://localhost/

# Test fake YandexBot
curl -s -o /dev/null -w '%{http_code}\n' \
    -A 'Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)' \
    http://localhost/

Both requests should return 403 as expected.

Checking the Error Log

The module logs detailed information about its decisions. Check the NGINX error log to understand what happens during verification:

tail -f /var/log/nginx/error.log

You will see messages like:

Verification failed, blocking request

Or for successful verifications:

Verification successful, allowing request

These logs help you troubleshoot any issues and confirm the module is working.

Inspecting the Cache

You can examine cached verification results using the Redis CLI:

keydb-cli KEYS '*:bvs'

This shows all cached bot verification status entries. To check a specific IP:

keydb-cli GET '192.0.2.1:bvs'

The result will be either success (verified bot) or failure (fake bot). To clear the cache during testing:

keydb-cli FLUSHALL

Performance Considerations

The NGINX bot verification module is designed for minimal performance impact. However, there are several factors to consider for optimal operation.

DNS Resolution Overhead

Reverse and forward DNS lookups add latency to requests from bot user-agents. Without caching, each request would require two DNS queries. The Redis cache eliminates this overhead for repeated visits from the same IP.

For high-traffic sites, consider these optimizations:

Increase cache expiry: Longer cache times mean fewer DNS lookups. Set bot_verifier_redis_expiry to 7200 or higher for production.
Use local DNS resolver: Configure a local caching DNS resolver like dnsmasq or unbound to speed up lookups.
Redis connection pooling: The module maintains persistent Redis connections. Ensure your Redis server has enough connection slots.

Failsafe Behavior

If the module cannot connect to Redis, it bypasses verification entirely. This failsafe prevents blocking legitimate traffic when the cache is unavailable. However, it also means fake bots can pass through during Redis outages.

Monitor your Redis service health with:

systemctl status keydb
keydb-cli ping

Consider setting up Redis monitoring alerts to detect connectivity issues promptly.

Security Best Practices

While NGINX bot verification provides strong protection, follow these additional best practices:

Combine with Other Bot Protection

For comprehensive bot protection, use the verifier module alongside other techniques. Consider the testcookie module for JavaScript-based bot challenges. For more granular visitor identification including device type and AI crawler detection, the NGINX device detection module provides detailed client fingerprinting. This combination catches both fake crawlers and automated scripts.

Protect Against Host Header Injection

Attackers can manipulate the HTTP Host header to poison caches or hijack password reset emails. This vulnerability is separate from bot spoofing but equally dangerous. See our guide on protecting from Host header injection for NGINX configuration patterns that block these attacks.

Combine with Rate Limiting

Bot verification works best alongside rate limiting. Even verified bots should respect reasonable limits:

limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

location /api/ {
    bot_verifier on;
    bot_verifier_redis_host 127.0.0.1;

    limit_req zone=api burst=20 nodelay;
}

Use with ModSecurity

For comprehensive protection, combine bot verification with a Web Application Firewall. The ModSecurity module provides additional security layers:

location / {
    bot_verifier on;
    bot_verifier_redis_host 127.0.0.1;

    modsecurity on;
    modsecurity_rules_file /etc/nginx/modsec/main.conf;
}

Monitor Blocked Requests

The bot verifier module logs all verification decisions to the NGINX error log. To track blocked fake bots, monitor the error log for verification messages:

grep "Verification failed, blocking request" /var/log/nginx/error.log

For real-time monitoring:

tail -f /var/log/nginx/error.log | grep -E "(Verification failed|blocking request)"

If you want to log blocked requests to a separate access log file, you can use conditional logging based on the response status. Since blocked bots receive a 403 response, create a map to filter by status:

map $status $is_blocked {
    403     1;
    default 0;
}

log_format botlog '$remote_addr - $status - "$http_user_agent"';
access_log /var/log/nginx/blocked.log botlog if=$is_blocked;

Note that this approach logs all 403 responses, not just bot verification blocks. For bot-specific logging, the error log provides more precise information including the verification reason.

IP-Based Access Control

For additional security, combine bot verification with IP whitelisting and blacklisting. This allows you to explicitly allow or deny known IP ranges.

Regular Configuration Audits

Use Gixy to analyze your NGINX configuration for security issues:

gixy /etc/nginx/nginx.conf

This tool detects common misconfigurations that could undermine your security measures.

Troubleshooting Common Issues

Here are solutions to problems you might encounter with NGINX bot verification:

Module Not Blocking Fake Bots

If fake bots are not being blocked, check these items:

Verify module is loaded: Run nginx -V 2>&1 | grep bot_verifier
Check directive is enabled: Ensure bot_verifier on; is in the correct location block
Test Redis connectivity: Run keydb-cli ping and verify it returns PONG
Check SELinux: Run getsebool httpd_can_network_connect and verify it is on
Review error log: Look for connection errors in /var/log/nginx/error.log
Avoid using return directive: The return directive in the same location block bypasses the access phase where bot verification runs. Use try_files or proxy_pass instead.

All Requests Being Blocked

If legitimate traffic is being blocked, verify:

Cache is not corrupted: Clear the cache with keydb-cli FLUSHALL
Only bot locations have verification: Ensure bot_verifier on is not in unexpected locations
Check the error log: The log will show why requests are being blocked

Legitimate Bots Being Blocked

If real search engine crawlers are being blocked:

Check DNS resolution: Ensure your server can resolve PTR records. Test with host 66.249.66.1 (a Google IP)
Verify realip configuration: If behind a proxy, ensure set_real_ip_from and real_ip_header are configured correctly
Check DNS resolver: Some internal DNS resolvers don’t return PTR records. Consider using public DNS (1.1.1.1 or 8.8.8.8)

High Latency on Bot Requests

If requests from bots are slow:

Verify DNS resolution speed: Test with time host 66.249.66.1
Check cache hit rate: High miss rates indicate cache expiry is too short
Monitor Redis performance: Use keydb-cli INFO to check memory and connections

Conclusion

NGINX bot verification provides essential protection against fake search engine crawlers. By implementing reverse DNS verification, you can confidently allow real bots while blocking impostors. This protects your content from scraping, your server from attacks, and your analytics from pollution.

The installation process is straightforward on Rocky Linux, AlmaLinux, and other RHEL-based distributions. The module integrates seamlessly with existing NGINX configurations. Furthermore, Redis caching ensures minimal performance impact even under heavy traffic.

Remember to test your configuration thoroughly after installation. Monitor the error logs to understand the module’s behavior. Combine NGINX bot verification with other security measures for comprehensive protection.

Related

Danila Vershinin

Founder & Lead Engineer

NGINX configuration and optimizationLinux system administrationWeb performance engineering

10+ years NGINX experience • Maintainer of GetPageSpeed RPM repository • Contributor to open-source NGINX modules

GitHub →LinkedIn →

NGINX / Security

NGINX Bot Verification: Block Fake Crawlers

Why You Need NGINX Bot Verification

How NGINX Bot Verification Works

Supported Search Engines

Installation on Rocky Linux, AlmaLinux, and RHEL

SELinux Configuration

Configuration Guide

Configuration Directives Explained

DNS Resolver Configuration

Reverse Proxy Configuration

Production Configuration Example

Testing Your NGINX Bot Verification Setup

Checking the Error Log

Inspecting the Cache

Performance Considerations

DNS Resolution Overhead

Failsafe Behavior

Security Best Practices

Combine with Other Bot Protection

Protect Against Host Header Injection

Combine with Rate Limiting

Use with ModSecurity

Monitor Blocked Requests

IP-Based Access Control

Regular Configuration Audits

Troubleshooting Common Issues

Module Not Blocking Fake Bots

All Requests Being Blocked

Legitimate Bots Being Blocked

High Latency on Bot Requests

Conclusion

Leave a Reply Cancel Reply

Recommended Web Hosting

Secure and Accelerate Your Website

NGINX / Security

NGINX Bot Verification: Block Fake Crawlers

🛠️ Related Tools

Why You Need NGINX Bot Verification

How NGINX Bot Verification Works

Supported Search Engines

Installation on Rocky Linux, AlmaLinux, and RHEL

SELinux Configuration

Configuration Guide

Configuration Directives Explained

DNS Resolver Configuration

Reverse Proxy Configuration

Production Configuration Example

Testing Your NGINX Bot Verification Setup

Checking the Error Log

Inspecting the Cache

Performance Considerations

DNS Resolution Overhead

Failsafe Behavior

Security Best Practices

Combine with Other Bot Protection

Protect Against Host Header Injection

Combine with Rate Limiting

Use with ModSecurity

Monitor Blocked Requests

IP-Based Access Control

Regular Configuration Audits

Troubleshooting Common Issues

Module Not Blocking Fake Bots

All Requests Being Blocked

Legitimate Bots Being Blocked

High Latency on Bot Requests

Conclusion

Leave a Reply Cancel Reply

Recommended Web Hosting

Secure and Accelerate Your Website

More Performance Related Articles!