Server Setup / Varnish

Varnish GeoIP

by ,


Building a website that has GeoIP features is useful for many reasons. You can pre-select users’ currency, language, enforce access restrictions, etc. Most importantly, you can optimize your E-Commerce website conversions by large.

In this post, I’m going to cover the Varnish GeoIP 2 module, which allows you to extend your Varnish with GeoIP functions. And quite a bit more…

Varnish and GeoIP

How / where you implement GeoIP in your Varnish-powered web stack – mostly Vary-es, as all things Varnish 🙂

To serve different content for different geo-locations, while keeping URLs the same – means you want to vary your cache by some geo-parameter (country code). Let’s call this geo-variations. You mostly always want to normalize the geo-parameter’s value there, to ensure cache efficiency.

To present users from different locations with different URLs, means only that – redirect them to different URLs, depending on their location. Let’s refer to this as geo-redirects going further.

If you have an nginx sandwich with Varnish

This kind of setup means using nginx for TLS termination in front of Varnish, and your backend is (another?) nginx instance. Then, you can simply load nginx-module-geoip2 in your “TLS nginx” and configure it in this way:

  • For geo-redirects, you would have a map of country codes to redirect URLs
  • For geo-variations, you would, for example, use proxy_set_header X-Country-Code ... to pass GeoIP country code to the backend, and then use it there (in the backend’s code). You will also need to make sure that your backend sends Vary by that header (or have Varnish do this for you).

This assumes that the TLS termination is configured within nginx’s http {...} context (HTTP proxy).

You can also do TLS termination in nginx using stream {...} as well, but that’s going to hurt you with no HTTP/2 support because with stream module, nginx is unable to negotiate ALPN protocols (as of yet).

If you use Hitch with Varnish

As was just mentioned, nginx TLS termination will result in only HTTP/1.1 when stream module is used, because it cannot negotiate ALPN protocol. Using TCP stream would be more efficient because it does not have to look inside HTTP data stream and unnecessarily inspect HTTP headers, and such.

Meet Hitch. It does not have this downside:

  • It can terminate TLS
  • AND it can handle ALPN protos. Magic! 🙂

Since Hitch is good in TLS termination only and nothing more, this is the time when you’ll extend Varnish with GeoIP features!

In this setup, you would load GeoIP 2 VMOD and:

  • For geo-redirects, simply apply redirects in VCL
  • For geo-variations, set geo-parameter via header in VCL so it’s visible to the backend. Then all the same stuff as before: make use of it in your backend code, and make sure that your backend sends Vary by that header (or implement this in VCL as well).

To be fair, you can also use Varnish GeoIP 2 VMOD with nginx sandwich setup as well. Just because you can code quite sophisticated logic within VCL rather than nginx configuration.

So it is when you make use of TLS termination software like Hitch (which is not capable / should not be able to handle any of HTTP semantics), and you want to leverage geolocation data in your app – you absolutely want to empower your Varnish with GeoIP capabilities.

Install Varnish GeoIP 2 VMOD in CentOS / RedHat 7

There are 2 GeoIP VMODs available at present: one that is using the now legacy .dat files format, by Varnish Software, and one with the support for newer, .mmdb files, by Federico G. Schwindt.

The .dat format is no longer receiving free data updates, so naturally, we want the VMOD with support for the newer format 🙂

So let’s get things rock and rolling by installing Varnish 6.0 LTS with everything we need.

The first step is to setup our repository:

yum -y install https://extras.getpagespeed.com/release-el7-latest.rpm

This repository will give access to a more recent GeoIP data update program, geoipupdate. It is capable of updating .mmdb files from MaxMind servers.

Next, enable repository with Varnish 6.0 LTS and its modules:

yum-config-manager --enable getpagespeed-extras-varnish60

A note about Varnish 6.0 LTS repository by GetPageSpeed

So I’ve built a little YUM repository … 🙂

With Varnish 6.0.2 becoming de-facto Varnish’s LTS version, and the vast array of VMODs I wanted to try and use in production, building this repo was something I was itchy to do.

Big thanks go towards Ingvar Hagelund as his COPR repository and packaging efforts are at the base of building our own Varnish 6 LTS repository.

If you want a repository that includes many VMODs and is powered by CDN – by all means, use my repo! 😀

Done promoting my repository. Let’s proceed to install GeoIP stuff:

yum install varnish vmod-geoip2 geoipupdate-cron 

This will install:

  • Varnish 6.0 LTS (won’t install it if you already have one from official repositories)
  • The vmod-geoip2 package
  • Weekly cron job for updating GeoIP databases and the geoipupdate program.

GeoIP.conf

Ensure special configuration file for updating GeoIP databases is present – /etc/GeoIP.conf with the following contents:

# The following AccountID and LicenseKey are required placeholders.
# For geoipupdate versions earlier than 2.5.0, use UserId here instead of AccountID.
AccountID 0
LicenseKey 000000000000

# Include one or more of the following edition IDs:
# * GeoLite2-City - GeoLite 2 City
# * GeoLite2-Country - GeoLite2 Country
# For geoipupdate versions earlier than 2.5.0, use ProductIds here instead of EditionIDs.
EditionIDs GeoLite2-City GeoLite2-Country

Now you can run geoipupdate once – this will do the initial download of the GeoIP databases. And viola, few seconds later you’d have the files GeoLite2-City.mmdb and GeoLite2-Country.mmdb downloaded to your /usr/share/GeoIP/ directory.

The cron that we had installed earlier, will make sure that the database files are updated weekly.

Getting started with GeoIP 2 VMOD

What you do with geolocation depends on your application requirements. But let’s check the basics of how to initialize the GeoIP 2 VMOD. In your VCL file you need to initialize it with:

import geoip2;
sub vcl_init {
  new country = geoip2.geoip2("/usr/share/GeoIP/GeoLite2-Country.mmdb");
}

Then you will be able to make use of GeoIP data further in your VCL logic.

Geo-variations

sub vcl_recv {
    set req.http.X-Country-Code = country.lookup("country/iso_code", client.ip);
    ...
}

This would make Varnish send the X-Country-Code HTTP header to your backend.

For instance, in PHP you would be able to read it from $_SERVER["HTTP_X_COUNTRY_CODE"], with the country code of visitor.

Efficient geo-variations. Normalization

Simply creating different, highly customized GeoIP page content, for all countries in the world, is probably not a feasible task. If you blindly vary cache for every value of X-Country-Code, this will unnecessary create duplicate cached data and reduce your cache hit-rate.

Suppose that we actually handcraft our pages to display differently for only 3 “target” countries: the United States, Russia and France (country codes US, RU, FR). For any other country, we want to present the US version. Knowing which countries we really vary cache for will allow us to partition cache efficiently, thus increasing cache hit-rate:

...
# additionally import std for `tolower` function
import std;
...
sub vcl_recv {
    set req.http.X-Country-Code = country.lookup("country/iso_code", client.ip);
    # Normalize country code to lower case
    set req.http.X-Country-Code = std.tolower(req.http.X-Country-Code);    
    if (req.http.X-Country-Code !~ "(us|ru|fr)") {
      set req.http.X-Country-Code = 'us';
    }
}

Vary wisely!

With geo-variations, we want different page content for different countries on the same URL. So of course, we have to teach our Varnish to partition cache by the country code.

There are 2 approaches here, and they depend on your needs.

Option 1. Hashing

First, there is hashing available, which will create multiple actual objects for each country code you want to vary page contents for. This makes it easy to target purging pages of specific countries. E.g. you have updated French variant of your page, and you want to clear only that variant. Then you can send X-Country-Code = fr while PURGE-ing it and only that variant would be cleared.

So, to be able to be able to clear individual geo-variations easily, you may want to use hashing to partition your Varnish cache:

sub vcl_hash {
  ...
  hash_data(req.http.X-Country-Code);
  ...
}

Option 2. Vary header

A different approach to have multiple page variants on the same URL is to use Vary header. The big upside here would be one cached object per page, but with multiple variations in Varnish. It is easy to purge such an object in its entirety, that is with all its variants. Just PURGE it 🙂

So to recap. You don’t get to use both approaches at the same time. Only one: so “choose your destiny”. As each approach has its specifics:

  • To clear all geo-variants of the same page which was hash-ed, you have to send as many PURGE requests, as there are geo-locations you support.
  • To clear just the specific geo-variant of a page that is Vary-ed – you’ll have to use req.hash_always_miss

OK, if I did not make myself clear yet, you really should use Vary always! 🙂 It provides for a flawless victory (“MK”): you can easily purge all variants (which may be cumbersome with hashing) or a specific variant.

The Vary approach goes down to this VCL:

# The backend creates content based on the normalized X-Country-Code:
sub vcl_backend_response {
    if (bereq.http.X-Country-Code) {
        if (!beresp.http.Vary) { # no Vary at all
            set beresp.http.Vary = "X-Country-Code";
        } elsif (beresp.http.Vary !~ "X-Country-Code") { # add to existing Vary
            set beresp.http.Vary = beresp.http.Vary + ", X-Country-Code";
        }
    }
}

And our complete VCL file may look like:

vcl 4.1;

import std;
import geoip2;

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

acl purgers { "127.0.0.1"; }

sub vcl_init {
  new country = geoip2.geoip2("/usr/share/GeoIP/GeoLite2-Country.mmdb");
}

sub vcl_recv {
       if (req.method == "PURGE") {
          if (!client.ip ~ purgers) {
             return (synth(405, "Purging not allowed for " + client.ip));
          }
          # Our app supplied X-Country-Code to indicate clearing of specific geo-variation
          if (req.http.X-Country-Code) {
             set req.method = "GET";
             set req.hash_always_miss = true;
          } else {
             # clear all geo-variants of this page
             return (purge);
          }
       } else {
          set req.http.X-Country-Code = country.lookup("country/iso_code", client.ip);
          # Normalize country code to lower case
          set req.http.X-Country-Code = std.tolower(req.http.X-Country-Code);    
          if (req.http.X-Country-Code !~ "(us|ru|fr)") {
              set req.http.X-Country-Code = "us";
          }
       }
}
# The backend creates content based on the normalized X-Country-Code:
sub vcl_backend_response {
    if (bereq.http.X-Country-Code) {
        if (!beresp.http.Vary) { # no Vary at all
            set beresp.http.Vary = "X-Country-Code";
        } elsif (beresp.http.Vary !~ "X-Country-Code") { # add to existing Vary
            set beresp.http.Vary = beresp.http.Vary + ", X-Country-Code";
        }
    }
}

So when we receive a PURGE request, we first check whether our app supplied the country code.
If it did, then we purge cache for just that country, using req.hash_always_miss, otherwise using (purge) will purge all country variations.

And if it’s not a PURGE request, this is when we use the GeoIP 2 VMOD to set country code for use in our backend.

P.S. Another geo-related use case involving Varnish for geo-variants is with a CDN of Varnish servers. Considering you do know the geo-location of each of your Varnish CDN edge server, you can have them each configured to send the proper X-Country-Code. Naturally, you would not need any GeoIP VMOD for that because when traffic reaches a particular Varnish server instance – it was already GeoIP – directed (by something like Route53 DNS).


Also published on Medium.

  1. ALan Jay (@alanjay)

    Hi Danila, great post – thanks. One question I have had a play with this (using Varnish 6.2) and don’t get errors when I load my VCL but I don’t get a country look up either are their any tips you might have for checking and testing the configuration?

    Reply
    • Danila Vershinin

      Hi ALan,

      I can’t be sure about Varnish 6.2. What makes you want to try that version instead of the rather more stable 6.0.x (recommended LTS)?

      I would start by checking how you have compiled the module (I assume you did not make use of my packages).

      And check whether your .mmdb GeoIP databases are there.

      You can run a test lookup on the command-line by installing libmaxminddb-devel package then:

      mmdblookup --file /usr/share/GeoIP/GeoLite2-Country.mmdb --ip 8.8.8.8 country names en
      

      Make sure this works. That is, at least to rule out bad/absent databases.

      Reply
      • ALan Jay (@alanjay)

        Thanks Danila – you are correct I didn’t use your modules but your article was a great resource to rolling my own :).

        I have moved to 6.1 and then 6.2 as I had some issues with 6.0.x and have found it to be stable.

        As you can see I found the answer being that the client.ip was being masked by the Amazon Elastic Load Balancer (or something else in the mix).

        Checking the forwarded for header seems to have fixed it for me.

        Thanks for all your articles.

        Reply
  2. ALan Jay (@alanjay)

    Worked out the issue was that using Amazon ELBs you need to use the FWD IP address:

    std.ip(regsub(req.http.X-Forwarded-For, “[\s,].*”, “”), client.ip)

    rather than the client.ip

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.