Server Setup / Varnish

Strip query parameters with Varnish

by , , revisited on


We have by far the largest RPM repository with dynamic stable NGINX modules and VMODs for Varnish 4.1 and 6.0 LTS. If you want to install NGINX, Varnish and lots of useful modules for them, this is your one stop repository to get all performance related software.
You have to maintain an active subscription in order to be able to use the repository!

There are often cases when you need Varnish to cache the page whether it contains query parameters or not.
Most common example of this is when Google (Adwords, Analytics, etc.) adds tracking parameters to your website URLs.
Namely, ?gclid and ?utm_ are appended to the final URL.

But this will cause Varnish to hold multiple cache entries for a single page.

The solution is quite simple. Varnish VCL can do wonders and we can actually rewrite the final URL that will reach our backend (Nginx). Simplicity is beauty: we strip the specific parameters. As a result, Varnish will cache those pages properly.

How to change your VCL to strip ?gclid and ?utm parameters

Add the following to your vcl_recv procedure (between sub vcl_recv { and closing bracket }:


if (req.url ~ "(\?|&)(gclid|utm_[a-z]+)=") {
    set req.url = regsuball(req.url, "(gclid|utm_[a-z]+)=[-_A-z0-9+()%.]+&?", "");
    set req.url = regsub(req.url, "[?|&]+$", "");
}

You can test the main regex in question by visiting this link. I made sure that it will work in all possible cases, including the case when the parameter’s value has round brackets.

The code will strip out Google Analytics campaign variables properly. Those variables are only needed by the Javascript running on the page. Variables are utm_source, utm_medium, utm_campaign, gclid, etc.

Updated on Feb 21, 2017: cleaned up the regex for removing question mark and ampersand from URL. More efficient and works better!

vmod-querystring

You may want to look into using the vmod-querystring for the same purpose. It has an advantage of less memory footprint, especially in case you have long URLs. However, this module is not available via RPM for Varnish 4.x and has to be compiled manually. If you’re using Varnish 5.2.x you can install it via COPR package

  1. Tim

    Thank you very much for this VCL adaptation to strip out utm tags from urls. I added \%\. to strip out also % and . as these characters were in many of my utms

    Reply
    • Danila Vershinin

      Hi Tim.

      Thanks for your input. I’ve added the missing bits to regex.

      Actually, I think escaping is not needed in “character class” part of regex, so it has been removed now.

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.