Minimalistic Cache Warmer

by Danila Vershinin, March 12, 2018 , revisited on August 21, 2022

We have by far the largest RPM repository with NGINX module packages and VMODs for Varnish. If you want to install NGINX, Varnish, and lots of useful performance/security software with smooth yum upgrades for production use, this is the repository for you.
Active subscription is required.

Here’s a simple cron job to warm up your website cache using wget:

@daily /usr/bin/wget --directory-prefix=/tmp --spider --recursive --no-directories --quiet https://www.getpagespeed.com/

--spider is for not downloading anything. However, this directive results in files created and deleted. Thus the following directive is useful:
--directory-prefix=/tmp ensures that temporary files will end up where they belong.
--recursive will force wget to crawl your website recursively.
--no-directories will ensure no empty directories are left out after running our minimalistic crawler
--quiet is just a good way to silence any output to avoid cron email being sent.

If your website publishes a sitemap.xml which includes all URLs (and sitemap of sitemaps), you can use curl to crawl.
The benefit is that you can crawl multiple encodings (e.g. Brotli), useful if you cache them separately.

To crawl through gzip and Brotli versions:

curl --no-buffer --silent https://www.example.com/robots.txt \
  | sed -n 's/^Sitemap: \(.*\)$/\1/p' | sed 's/\r$//g' | xargs -n1 curl --no-buffer --silent | grep -oP '<loc>\K[^<]*' \
  | xargs -n1 curl --no-buffer --silent -H 'Accept-Encoding: br' 

curl --no-buffer --silent https://www.example.com/robots.txt \
  | sed -n 's/^Sitemap: \(.*\)$/\1/p' | sed 's/\r$//g' | xargs -n1 curl --no-buffer --silent | grep -oP '<loc>\K[^<]*' \
  | xargs -n1 curl --no-buffer --silent -H 'Accept-Encoding: gzip'

To crawl through multiple CDN servers (hosted on different IPs), use --resolve www.example.com:443:x.x.x.x for the above, and call those commands with different x.x.x.x specifying your edge server IP addresses.

Best is creating a script, crawlup.sh, and call it like so in cron:

@daily /usr/local/bin/crawlup.sh 2>&1 >/dev/null

#!/bin/bash

# crawl main server first:

curl --no-buffer --silent https://www.example.com/robots.txt \
  | sed -n 's/^Sitemap: \(.*\)$/\1/p' | sed 's/\r$//g' | xargs -n1 curl --no-buffer --silent | grep -oP '<loc>\K[^<]*' \
  | xargs -n1 curl --no-buffer --silent -H 'Accept-Encoding: br'

curl --no-buffer --silent https://www.example.com/robots.txt \
  | sed -n 's/^Sitemap: \(.*\)$/\1/p' | sed 's/\r$//g' | xargs -n1 curl --no-buffer --silent | grep -oP '<loc>\K[^<]*' \
  | xargs -n1 curl --no-buffer --silent -H 'Accept-Encoding: gzip'


# crawl edge servers, 2 in this case:

edges=( x.x.x.x y.y.y.y )

for ip in "${edges[@]}"
do
  curl --no-buffer --silent https://www.example.com/robots.txt \
    | sed -n 's/^Sitemap: \(.*\)$/\1/p' | sed 's/\r$//g' | xargs -n1 curl --no-buffer --silent | grep -oP '<loc>\K[^<]*' \
    | xargs -n1 curl --no-buffer --silent -H 'Accept-Encoding: br' --resolve www.example.com:443:$ip

  curl --no-buffer --silent https://www.example.com/robots.txt \
    | sed -n 's/^Sitemap: \(.*\)$/\1/p' | sed 's/\r$//g' | xargs -n1 curl --no-buffer --silent | grep -oP '<loc>\K[^<]*' \
    | xargs -n1 curl --no-buffer --silent -H 'Accept-Encoding: gzip' --resolve www.example.com:443:$ip
done

Follow up

Users of WordPress may be interested in automatic warming of purged cache after editing or adding content.
This is one of the features implemented in the WordPress Cacheability plugin.

Literature:

SCRAPING AND PARSING SITEMAPS IN BASH

Aditya Nath Jha
December 31, 2022
Is it possible to warm a site’s cache with the Cacheability plugin if you are already using W3 Total Cache plugin for caching?

Reply
- Danila Vershinin
  December 31, 2022
  Hi Aditya,
  
  Currently, no. Cacheability hooks into events provided by the Proxy Cache Purge plugin, so it’s only compatible with Varnish/NGINX cache + Proxy Cache Plugin combination.
  
  Reply

Server Setup

Minimalistic Cache Warmer

Follow up

Literature:

Like this:

Related

2 thoughts on “Minimalistic Cache Warmer”

Aditya Nath Jha

Danila Vershinin

Leave a Reply Cancel Reply

Recommended Web Hosting

Secure email hosting for your domain

Secure and Accelerate Your Website

Server Setup

Minimalistic Cache Warmer

Follow up

Literature:

Share this:

Like this:

Related

Aditya Nath Jha

Danila Vershinin

Leave a Reply Cancel Reply

Recommended Web Hosting

Secure email hosting for your domain

Secure and Accelerate Your Website

More Performance Related Articles!