fbpx

Varnish

Varnish Virtual Hosts. The Right Way

by , , revisited on


We have by far the largest RPM repository with NGINX module packages and VMODs for Varnish. If you want to install NGINX, Varnish, and lots of useful performance/security software with smooth yum upgrades for production use, this is the repository for you.
Active subscription is required.

Ever seen this snippet below for Varnish virtual hosts and wondered how you’re going to manage a dozen of websites with the same dozen of if statements in your VCL file?

if (! req.http.Host) {
   error 404 "Need a host header";
}
set req.http.Host = regsub(req.http.Host, "^www\.", "");
set req.http.Host = regsub(req.http.Host, ":80$", "");

if (req.http.Host == "something.com") {
    include "/etc/varnish/site-something.com.vcl";
} elsif (req.http.Host == "somethingelse.com") {
   include "/etc/varnish/site-somethingelse.com.vcl";
}

While Varnish is so fine and great, it really lacks some documentation and tutorials on setting up virtual hosts the right way.

Varnish Virtual Hosts

Why do we need virtual hosts in Varnish so much? It’s a caching server. It doesn’t care for the domain name that is present in a request. It simply passes a request along to the backend server, or, if it’s present in Varnish cache, serves it directly without talking to Nginx or Apache.

But we need virtual hosts in Varnish. Because different sites use different technologies, different login pages, and so most importantly, they use different cookie names. Cookies are the primary reason the need for Varnish virtual hosts exists. So that we can filter against different cookies.

In general, we need Varnish to distinguish between the sites to adjust its caching policy towards specific website.

There is no built-in way and likely would never be. However, having the understanding of how the VCL works, you can manage to define your virtual hosts very similar to the way you love to do it in Nginx: through sites-available and sites-enabled directories. So let’s go.

How Varnish VCL works

Before we proceed to implementing Varnish virtual hosts, let’s review the most important thing about VCL – how include files work.

When you land with your new Varnish installation, you start coding from default.vcl. However, you have to realize one thing. There is another file with very base default VCL rules which Varnish has internally, let’s call it builtin.vcl. After executing routines in our default.vcl, Varnish will append routines from builtin.vcl making those run after the ones in our VCL file.

The two files may have the same routines, i.e. vcl_recv in both files, and these routines would both run on every request. In this order:

  • first, default.vcl
  • last, builtin.vcl

So the same routine, defined in last included file, will stack up and be called last.
If we include another file, say my.vcl and define vcl_recv in there, Varnish will run it in this order:

  1. vcl_recv from default.vcl
  2. vcl_recv from my.vcl
  3. vcl_recv from builtin.vcl

How is this multiple files inclusion any useful?

To make things flexible, Varnish would not call routines from included file, if you put return(...) statement in procedure of the current file.

It means that we can prevent Varnish default behavior (found in builtin VCL) by running specific logic on the same routine, and we can extend things further using include files.

So if vcl_recv had return(...) in default.vcl, then Varnish would only run:

  1. vcl_recv from default.vcl

Varnish Virtual Hosts strategy

So here’s the strategy we should start with when we code our VCL for multiple hosts. Let’s review on that same routine vcl_recv, which is most important, since it commonly have rules for filtering cookies or setting backend hints.

We assume you’re using CentOS/RHEL based paths, you can adjust accordingly for Debian derived systems.

First, create a directory holding your virtual hosts:

mkdir /etc/varnish/sites-enabled

Suppose we have a site a.example.com, it’s a WordPress blog with comments disabled. We want to have it ignore all the cookies except for the /wp-admin. Let’s create virtual host file.

nano /etc/varnish/sites-enabled/a.example.com.vcl

And paste in:

sub vcl_recv {
       if (req.http.host == "a.example.com") {
           # ignore all cookies on a WP site without comments (except for admin areas)
           if (req.url !~ "^/wp-(login|admin)") {
               unset req.http.cookie;
           }
       }
}

Now, another website of ours, b.example.com is so much different. It’s a Trac ticketing website and it runs using standalone Python app on a different port!

nano /etc/varnish/sites-enabled/b.example.com.vcl

And paste in:

backend trac {
    .host = "127.0.0.1";
    .port = "3050";
}

sub vcl_recv {
       if (req.http.host == "b.example.com") {
           set req.backend_hint = trac;
       }
}

Another website of ours, has WordPress with Woocommerce plugin. We don’t want to cache Woocommerce pages there. So we run:

nano /etc/varnish/sites-enabled/c.example.com.vcl

And paste in:


sub vcl_recv {
    if (req.http.host == "c.example.com") {
        if (req.url ~ "/(cart|my-account|checkout|addons|/?add-to-cart=)") {
        return (pass);
        }
    }
}

For every website, we use Google Analytics tracking. So let’s create handling for all the hosts in the file /etc/varnish/catch-all.vcl with the following:

sub vcl_recv {
        set req.http.Cookie = regsuball(req.http.Cookie, "_ga=[^;]+(; )?", "");
        set req.http.Cookie = regsuball(req.http.Cookie, "_gat=[^;]+(; )?", "");
}

Next, we want to put everything together.

Update default.vcl in the following way:


vcl 4.0;
...
sub vcl_recv {
        ....
        # Normalize the header, remove the www and port 
        set req.http.host = regsub(req.http.host, "^www\.", "");
        set req.http.host = regsub(req.http.host, ":[0-9]+", "");

}
...
# at the very bottom:
include "all-vhosts.vcl";
include "catch-all.vcl";

Create all-vhosts.vcl file. It should contain:

include "sites-enabled/a.example.com.vcl";
include "sites-enabled/b.example.com.vcl";
include "sites-enabled/c.example.com.vcl";

Now we can reload Varnish by running service varnish reload. Varnish will handle different websites in specific way. Our main VCL file will not be abused by dozens of if statements and we can always disable special handling by commenting an include from all-vhosts.vcl file and reloading again.

The basic rules of placing VCL logic this way are the following:

  • vcl_recv() in default.vcl should contain things like normalising headers. It is crucial that this procedure does not call return(...) statement
  • vcl_recv() in virtual host files like sites-enabled/a.example.com.vcl should contain filtering that is specific to this domain and may optionally call return(...) to halt further processing or filtering. It may also contain backend hints or rules to skip cache for specific URLs
  • vcl_recv() in catch-all.vcl should contain just very common filtering, i.e. Google Analytics cookies or anything that is common for all the sites

You can start with the following sample configuration. Feel free to fork or send pull requests.

  1. Photofolio

    When creating different .conf files – eg. nano /etc/varnish/sites-enabled/example3.com.conf you do not create .vcl files but .conf files.

    Then at all-hosts.vcl, you refer to .vcl files – I guess you were meant to create the .conf files as .vcl files ?

    Reply
  2. TheWriter

    Thanks for this! A lot of conflicting stuff when Googling and this seems the most helpful. I have a few questions, if I may.

    Update default.vcl in the following way:

    Am I to replace everything in my default .vcl with what you’ve provided? What about my backend default { stuff?

    I use W3TC (WordPress) to control the purging of Varnish and it requires some additional configuration for the default.vcl — is this compatible? If you’re willing to look it’s in «/wp-content/plugins/w3-total-cache/ini/varnish-sample-config.vcl»

    I’ve added everything you’ve provided along with ^^ these two and so far I think it’s working… the site hasn’t crashed at least Lol.

    So I’m using ServerPilot.io for my stack and they way they recommend, because I use SSL, it becomes nginx >> varnish >> apache >>php-fpm

    But in their notes, perhaps, doing multiple vhost may be built in? https://serverpilot.io/community/articles/how-to-install-and-configure-varnish.html

    Sorry for ALL these questions just looking into some insight. I’ve not been happy with my Redis configuration, my fastcgi config, etc… when my site hits pounded I see my server load and memory usage still climb to levels that I feel, with either or both of those configured, it shouldn’t reach.

    I think Varnish is the answer.

    Thank you!

    Reply
    • Danila Vershinin

      You would overwrite everything in your default.vcl but of course, keep the lines relevant to your configuration (backend definitions).

      I do not think ServerPilot is worth it. I have a post about them here.

      For invalidation of cache: you would better integrate the purge logic from DreamHost VCL collection here. Just include it from your default.vcl file. It has superior logic that is more friendly to CPU, compared to the one in W3TC sample.

      Reply
  3. MisUszatek

    For some reason default.vcl show error on reload. Is it possibel that Lets encrypt certificate can cause it? Any hint on how to set this script for Googel Cloud / Debian / SSL / multiple domains on one instance via Virtual Host? Thank you

    Reply
  4. syscall0

    Hello Danila, do you think your approach still valid? On new varnish versions was introduced support to multiple vcl using label: https://varnish-cache.org/docs/5.1/users-guide/vcl-separate.html

    What do you think? Thanks

    Reply
    • Danila Vershinin

      Yes, it’s still valid. Moreover, I find it is cleaner than their suggested VCL.

      The vcl.load allows you to load and label VCL files. I believe this was there for a while.
      Then instead of includeing domain-specific logic in the main VCL, the labeled VCLs are included in vcl_recv, via return (vcl(domain_label)); (this is new).

      It is maybe cool because you can reload the configuration for just a single piece of Varnish logic (e.g. website), as opposed to a complete VCL reload.

      But it looks like you’ll have to write some custom startup scripts to make sure that all the files are loaded when you start Varnish.
      And if‘s are not going anywhere: with many domains, you’ll end up with a huge vcl_recv in the main file.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.