fbpx

NGINX / Security

Stop Google Analytics SPAM bots and reduce server load

by ,


We have by far the largest RPM repository with NGINX module packages and VMODs for Varnish. If you want to install NGINX, Varnish, and lots of useful performance/security software with smooth yum upgrades for production use, this is the repository for you.
Active subscription is required.

I’m addicted to investigating server log files. I’ve stumbled upon very unusual requests: POSTs to homepage. Surely enough there’s no form on the homepage that might have been used.

The requests look like this:

5.75.71.6 - - [17/Sep/2017:12:27:28 +0000] "POST / HTTP/1.1" 200 5953 "-" "-" "5.75.71.6" "some-funny-hostname" sn="www.example.com" rt=0.241 ua="unix:/var/run/php-fpm/php-fpm-example.com.sock" us="200" ut="0.237" ul="20755" cs=-

The IP addresses for these requests 100% times originated from Iran or China.

OK, let’s investigate what data in fact they are posting to us. I’ve adjusted main script of the website in question to log relevant data to separate log file. At the top of index.php, added:

if ($_SERVER['REQUEST_METHOD'] === 'POST' && $_SERVER['REQUEST_URI'] == '/') {
    $req_dump = print_r($_REQUEST, TRUE);
    $fp = fopen(dirname(__FILE__) . '/request.log', 'a');
    fwrite($fp, print_r($_SERVER, true));
    fwrite($fp, $req_dump);
    fclose($fp);
}

Investigating the request.log file created I found that in fact they are not posting any data. However, the HTTP Host header (as obvious already from the nginx log) is always some funny and spammy website name.

Now it became obvious to me that those are:

  • Spam bots
  • They submit advertised website’s name in the Host header in order to appear in your Google Analytics reports
  • This will happen only with the website that is set as default on your server (nginx: default_server directive, or the first one listed)
  • They are submitting requests as POST in order to bypass any caching. Typically POST requests are not cached as per configuration. Thus they create unnecessary load on server besides spamming your Google Analytics!

So even if you don’t have Google Analytics in the first place, those spam bots will put extra strain to your server. Which is quite concerning. Let’s put those bots to peace with simple configuration of nginx.

The ultimate fix here would be simply to make nginx drop requests to hostnames (websites) that you know are not yours.

One implementation of that fix would involve creating a “catch-all” default server in nginx like this:

server {
    listen 80  default_server;
    listen 443 ssl default_server;
    ssl_certificate dummy.crt; ...
    server_name  _; # some invalid name that won't match anything
    return 444;
} 

Any time someone is trying to access a website with a name that is nowhere defined in your nginx configuration, this server block will be used. And nginx will silently drop those requests. The special HTTP status code 444 does exactly that (unique to nginx).

While this would definately work, I’m not a fan of breaking things while fixing them. What would break you might ask?

The fix above would break SSL for browsers which do not support SNI. Because they talk directly to the site that we have defined to be default_server when initiating connection, they would simply get dropped connections.

So what we need here is different. We need to drop requests when host name is irrelevant to our main site’s configuration after SSL connection is already established.

Easy proper fix:

server {
    listen 80 default_server;
    listen 443 ssl default_server;
    ssl_certificate example.com.crt; ...
    server_name example.com

    if ($http_host !~ "example\.com") {
        return 444;
    }

As easy as that:

  • SNI still works
  • Irrelevant requests are simply discarded by nginx
  • Your server is happy and has less load

An improvement of the config would be:

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.