Monitoring / Nginx

Fixing NGINX Amplify’s failure after reboot

by , , revisited on


NGINX Amplify service is great for monitoring your Linux web server’s health overall, and NGINX metrics in particular.

The horrible issue

However, what if your monitoring fails to resume after a reboot? That’s a gross bug, but it is present until now.

To be honest, there are some outstanding issues which are easy to solve, but have not been addressed in many months. Let’s hope those are fixed fast, and in the meantime resolve the reboot failure bug in CentOS / RedHat 7.

Symptoms

Every time you reboot your server, NGINX Amplify agent fails to start automatically and you are being alerted repeatedly by the Amplify cloud server telling you that the agent is not active.

Reason

The Amplify agent’s log file will contain:

2019-02-10 06:43:37,741 [2533] MainThread failed POST "https://receiver.amplify.nginx.com:443/1.3/..../agent/", exception: "("bad handshake: Error([(None, 'osrandom_rand_bytes', 'getrandom() initialization failed.')],)",)"

So the agent’s boot failure (in this case, at least) is caused by random number generator failing to gather enough entropy in time. Virtual Private Servers will be more prone to this.

Solution

Step 1. Install rng-tools

Now that we know we need more entropy in boot time, we know what to do 🙂 We will make use of rngd daemon to mix-in additional hardware-originated entropy starting early from boot time.

yum -y install rng-tools

Before you proceed further, verify that you actually got some hardware allowing to generate random numbers faster (physical or virtual) available to your machine (VPS).

rngd -f --list

Sample output:

Entropy sources that are available but disabled
1: TPM RNG Device
4: NIST Network Entropy Beacon
Available and enabled entropy sources:
2: Intel RDRAND Instruction RNG
5: JITTER Entropy generator

If your virtual machine has at least some available sources of entropy for rng-tools, you may proceed to the next step.

Step 2. Enable rngd

Now all that is needed to complete our fix for NGINX Amplify’s agent failure is:

systemctl enable rngd

You may want to start the service now with systemctl start rngd and verify it works without failure, or reboot server for fix verification (systemctl status amplify-agent after reboot should report running status).

A note for perfectionists

You may notice that systemctl status rngd includes:

Failed to init entropy source 0: Hardware RNG Device

Whether this is present depends on the way virtualization is configured by your host. If there are other hardware sources available for rngd, you can configure it to skip this entropy source with index 0 by running systemctl edit rngd and putting:

[Service]
ExecStart=
ExecStart=/sbin/rngd -f --exclude=0

Closing words

The above solution uses hardware random number generator on your system. This is usually enough. However, if you don’t trust those generators to be random enough or simply don’t have any, you may want to look into using Haveged.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.