Site icon GetPageSpeed

Stop Google Analytics SPAM bots and reduce server load

nginx

nginx

I’m addicted to investigating server log files. I’ve stumbled upon very unusual requests: POSTs to homepage. Surely enough there’s no form on the homepage that might have been used.

The requests look like this:

5.75.71.6 - - [17/Sep/2017:12:27:28 +0000] "POST / HTTP/1.1" 200 5953 "-" "-" "5.75.71.6" "some-funny-hostname" sn="www.example.com" rt=0.241 ua="unix:/var/run/php-fpm/php-fpm-example.com.sock" us="200" ut="0.237" ul="20755" cs=-

The IP addresses for these requests 100% times originated from Iran or China.

OK, let’s investigate what data in fact they are posting to us. I’ve adjusted main script of the website in question to log relevant data to separate log file. At the top of index.php, added:

if ($_SERVER['REQUEST_METHOD'] === 'POST' && $_SERVER['REQUEST_URI'] == '/') {
    $req_dump = print_r($_REQUEST, TRUE);
    $fp = fopen(dirname(__FILE__) . '/request.log', 'a');
    fwrite($fp, print_r($_SERVER, true));
    fwrite($fp, $req_dump);
    fclose($fp);
}

Investigating the request.log file created I found that in fact they are not posting any data. However, the HTTP Host header (as obvious already from the nginx log) is always some funny and spammy website name.

Now it became obvious to me that those are:

So even if you don’t have Google Analytics in the first place, those spam bots will put extra strain to your server. Which is quite concerning. Let’s put those bots to peace with simple configuration of nginx.

The ultimate fix here would be simply to make nginx drop requests to hostnames (websites) that you know are not yours.

One implementation of that fix would involve creating a “catch-all” default server in nginx like this:

server {
    listen 80  default_server;
    listen 443 ssl default_server;
    ssl_certificate dummy.crt; ...
    server_name  _; # some invalid name that won't match anything
    return 444;
} 

Any time someone is trying to access a website with a name that is nowhere defined in your nginx configuration, this server block will be used. And nginx will silently drop those requests. The special HTTP status code 444 does exactly that (unique to nginx).

While this would definately work, I’m not a fan of breaking things while fixing them. What would break you might ask?

The fix above would break SSL for browsers which do not support SNI. Because they talk directly to the site that we have defined to be default_server when initiating connection, they would simply get dropped connections.

So what we need here is different. We need to drop requests when host name is irrelevant to our main site’s configuration after SSL connection is already established.

Easy proper fix:

server {
    listen 80 default_server;
    listen 443 ssl default_server;
    ssl_certificate example.com.crt; ...
    server_name example.com

    if ($http_host !~ "example\.com") {
        return 444;
    }

As easy as that:

An improvement of the config would be:

Exit mobile version