yum upgrades for production use, this is the repository for you.
Active subscription is required.
Many website owners allow search engine bots to bypass security measures. They do this to ensure proper indexing and maintain good SEO rankings. However, this creates a significant vulnerability. Malicious actors frequently impersonate legitimate crawlers like Googlebot to scrape content, launch attacks, or bypass rate limits.
The NGINX bot verification module solves this problem by validating whether visitors claiming to be search engine bots are genuine. It uses the same reverse DNS verification method that Google, Microsoft, and other search engines officially recommend. In this comprehensive guide, you will learn how to install, configure, and test this essential security module.
Why You Need NGINX Bot Verification
Search engine crawlers identify themselves through the User-Agent header. For example, Googlebot sends a header like this:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
The problem is simple. Anyone can set this header. A malicious script can easily claim to be Googlebot. Therefore, many websites unknowingly grant special access to attackers who spoof these headers.
Consider these common scenarios where fake bots cause problems:
- Content scraping: Competitors steal your content by pretending to be search crawlers
- DDoS attacks: Attackers bypass rate limits by using bot user-agents
- Vulnerability scanning: Hackers avoid security tools by impersonating crawlers
- Click fraud: Bots fake their identity to manipulate analytics data
The solution is reverse DNS verification. This technique confirms that the requesting IP address actually belongs to the claimed search engine. Google, Microsoft, and other search providers officially document this verification method.
How NGINX Bot Verification Works
The module operates in the access phase of request processing. When a request arrives, it follows this verification process:
- User-Agent Detection: The module checks if the User-Agent header contains known bot identifiers (Google, Bing, Yahoo, Baidu, or Yandex)
- Reverse DNS Lookup: If a bot is detected, the module performs a reverse DNS lookup on the client IP address
- Domain Validation: The resulting hostname must match approved domains for that search engine
- Forward DNS Verification: The module confirms the hostname resolves back to the original IP
- Result Caching: Valid and invalid results get cached in Redis to prevent repeated lookups
This NGINX bot verification approach is effective because search engines control their DNS records. An attacker cannot spoof the reverse DNS of IP addresses they do not own. Moreover, the caching mechanism ensures minimal performance impact on your server.
Supported Search Engines
The module validates bots from these major search engines:
| Search Engine | Verified Domains |
|---|---|
| googlebot.com, google.com | |
| Bing | search.msn.com |
| Yahoo | yahoo.com |
| Baidu | crawl.baidu.com |
| Yandex | yandex.com, yandex.net, yandex.ru |
When a request fails verification, the module returns a 403 Forbidden response. This blocks fake crawlers while allowing legitimate search engine bots to access your content normally.
Installation on Rocky Linux, AlmaLinux, and RHEL
Installing the NGINX bot verification module requires the GetPageSpeed repository. This repository provides pre-built packages for all major RHEL-based distributions. Follow these steps to install the module.
First, install the GetPageSpeed repository:
dnf -y install https://extras.getpagespeed.com/release-latest.rpm
Next, install the bot verifier module along with Redis (or KeyDB) for caching:
dnf -y install nginx-module-bot-verifier keydb
KeyDB is a high-performance Redis alternative that works identically for this purpose. You can also use the standard Redis server if you prefer.
Start and enable the caching service:
systemctl enable --now keydb
Finally, load the module in your NGINX configuration. Add this line at the very top of /etc/nginx/nginx.conf, before the events block:
load_module modules/ngx_http_bot_verifier_module.so;
Test and reload your configuration:
nginx -t && systemctl reload nginx
SELinux Configuration
On systems with SELinux enabled, NGINX needs permission to connect to Redis. Run this command to allow network connections:
setsebool -P httpd_can_network_connect 1
Without this setting, the NGINX bot verification module will bypass verification and log connection errors. Therefore, this step is essential for proper functionality.
Configuration Guide
The bot verifier module uses simple directives within location blocks. Here is a complete configuration example:
server {
listen 80;
server_name example.com;
location / {
bot_verifier on;
bot_verifier_redis_host 127.0.0.1;
bot_verifier_redis_port 6379;
bot_verifier_redis_connection_timeout 10;
bot_verifier_redis_read_timeout 10;
bot_verifier_redis_expiry 3600;
# Your existing location configuration
try_files $uri $uri/ =404;
}
}
Configuration Directives Explained
The module provides several directives for fine-tuning its behavior:
bot_verifier (on|off)
This directive enables or disables bot verification for the location. The default value is off. Set it to on to activate the module.
bot_verifier_redis_host
Specifies the Redis or KeyDB server hostname. The default is localhost. Use the IP address 127.0.0.1 or the hostname of your caching server.
bot_verifier_redis_port
Sets the Redis port number. The default is 6379, which is the standard Redis port. Change this only if your Redis server uses a non-standard port.
bot_verifier_redis_connection_timeout
Defines the connection timeout in seconds. The default is 10 seconds. Lower values provide faster failure detection but may cause issues with slow networks.
bot_verifier_redis_read_timeout
Sets the timeout for Redis read operations. The default is 10 seconds. This affects how long the module waits for cached results.
bot_verifier_redis_expiry
Controls how long verification results stay cached. The default is 3600 seconds (one hour). Longer values reduce DNS lookups but delay detection of IP changes.
Production Configuration Example
For production environments, consider this enhanced NGINX bot verification configuration:
# Main context - load the module
load_module modules/ngx_http_bot_verifier_module.so;
http {
# ... other settings ...
server {
listen 443 ssl http2;
server_name example.com;
# Protect all dynamic content
location / {
bot_verifier on;
bot_verifier_redis_host 127.0.0.1;
bot_verifier_redis_port 6379;
bot_verifier_redis_expiry 7200;
proxy_pass http://backend;
}
# Static files do not need verification
location /static/ {
bot_verifier off;
alias /var/www/static/;
}
}
}
This configuration applies verification only to dynamic content. Static files skip verification because scraping static assets poses less risk. Additionally, the cache expiry is set to 7200 seconds (two hours) to further reduce DNS lookups.
Testing Your NGINX Bot Verification Setup
After installing and configuring the module, you should verify it works correctly. Use these curl commands to test different scenarios.
First, test a normal request without any bot user-agent:
curl -s -o /dev/null -w '%{http_code}\n' http://localhost/
This should return 200, indicating the request passed normally.
Next, test a fake Googlebot request from your local machine:
curl -s -o /dev/null -w '%{http_code}\n' \
-A 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' \
http://localhost/
This should return 403 because your local IP does not belong to Google. The module correctly identified the fake bot.
You can also test other bot user-agents:
# Test fake Bingbot
curl -s -o /dev/null -w '%{http_code}\n' \
-A 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)' \
http://localhost/
# Test fake YandexBot
curl -s -o /dev/null -w '%{http_code}\n' \
-A 'Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)' \
http://localhost/
Both requests should return 403 as expected.
Checking the Error Log
The module logs detailed information about its decisions. Check the NGINX error log to understand what happens during verification:
tail -f /var/log/nginx/error.log
You will see messages like:
User Agent identified as provider Mozilla/5.0 (compatible; Googlebot/2.1; ...)
HOSTNAME: some-hostname.example.com
Result does not match known domain
Verification failed, blocking request
These logs help you troubleshoot any issues and confirm the module is working.
Inspecting the Cache
You can examine cached verification results using the Redis CLI:
keydb-cli KEYS '*:bvs'
This shows all cached bot verification status entries. To check a specific IP:
keydb-cli GET '192.0.2.1:bvs'
The result will be either success (verified bot) or failure (fake bot). To clear the cache during testing:
keydb-cli FLUSHALL
Performance Considerations
The NGINX bot verification module is designed for minimal performance impact. However, there are several factors to consider for optimal operation.
DNS Resolution Overhead
Reverse and forward DNS lookups add latency to requests from bot user-agents. Without caching, each request would require two DNS queries. The Redis cache eliminates this overhead for repeated visits from the same IP.
For high-traffic sites, consider these optimizations:
- Increase cache expiry: Longer cache times mean fewer DNS lookups. Set
bot_verifier_redis_expiryto 7200 or higher for production. - Use local DNS resolver: Configure a local caching DNS resolver like
dnsmasqorunboundto speed up lookups. - Redis connection pooling: The module maintains persistent Redis connections. Ensure your Redis server has enough connection slots.
Failsafe Behavior
If the module cannot connect to Redis, it bypasses verification entirely. This failsafe prevents blocking legitimate traffic when the cache is unavailable. However, it also means fake bots can pass through during Redis outages.
Monitor your Redis service health with:
systemctl status keydb
keydb-cli ping
Consider setting up Redis monitoring alerts to detect connectivity issues promptly.
Security Best Practices
While NGINX bot verification provides strong protection, follow these additional best practices:
Combine with Other Bot Protection
For comprehensive bot protection, use the verifier module alongside other techniques. Consider the testcookie module for JavaScript-based bot challenges. This combination catches both fake crawlers and automated scripts.
Combine with Rate Limiting
Bot verification works best alongside rate limiting. Even verified bots should respect reasonable limits:
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location /api/ {
bot_verifier on;
bot_verifier_redis_host 127.0.0.1;
limit_req zone=api burst=20 nodelay;
}
Use with ModSecurity
For comprehensive protection, combine bot verification with a Web Application Firewall. The ModSecurity module provides additional security layers:
location / {
bot_verifier on;
bot_verifier_redis_host 127.0.0.1;
modsecurity on;
modsecurity_rules_file /etc/nginx/modsec/main.conf;
}
Monitor Blocked Requests
Track blocked bot requests in your access log for security analysis:
log_format botlog '$remote_addr - $status - "$http_user_agent"';
access_log /var/log/nginx/bots.log botlog if=$is_bot_blocked;
IP-Based Access Control
For additional security, combine bot verification with IP whitelisting and blacklisting. This allows you to explicitly allow or deny known IP ranges.
Regular Configuration Audits
Use Gixy to analyze your NGINX configuration for security issues:
gixy /etc/nginx/nginx.conf
This tool detects common misconfigurations that could undermine your security measures.
Troubleshooting Common Issues
Here are solutions to problems you might encounter with NGINX bot verification:
Module Not Blocking Fake Bots
If fake bots are not being blocked, check these items:
- Verify module is loaded: Run
nginx -V 2>&1 | grep bot_verifier - Check directive is enabled: Ensure
bot_verifier on;is in the correct location block - Test Redis connectivity: Run
keydb-cli pingand verify it returnsPONG - Check SELinux: Run
getsebool httpd_can_network_connectand verify it ison - Review error log: Look for connection errors in
/var/log/nginx/error.log
All Requests Being Blocked
If legitimate traffic is being blocked, verify:
- Cache is not corrupted: Clear the cache with
keydb-cli FLUSHALL - Only bot locations have verification: Ensure
bot_verifier onis not in unexpected locations - Check the error log: The log will show why requests are being blocked
High Latency on Bot Requests
If requests from bots are slow:
- Verify DNS resolution speed: Test with
time host 66.249.66.1 - Check cache hit rate: High miss rates indicate cache expiry is too short
- Monitor Redis performance: Use
keydb-cli INFOto check memory and connections
Conclusion
NGINX bot verification provides essential protection against fake search engine crawlers. By implementing reverse DNS verification, you can confidently allow real bots while blocking impostors. This protects your content from scraping, your server from attacks, and your analytics from pollution.
The installation process is straightforward on Rocky Linux, AlmaLinux, and other RHEL-based distributions. The module integrates seamlessly with existing NGINX configurations. Furthermore, Redis caching ensures minimal performance impact even under heavy traffic.
Remember to test your configuration thoroughly after installation. Monitor the error logs to understand the module’s behavior. Combine NGINX bot verification with other security measures for comprehensive protection.
For additional information, visit the module’s GitHub repository. You can find the complete source code and report any issues there.
