VPS Is Still Shared Hosting: Diagnose Noisy Neighbors the Right Way

Danila Vershinin

3 months ago

TL;DR: A VPS is a slice of a bigger machine. You share CPU time, storage queues, and network. Symptoms like high load average with low CPU usage typically mean contention—CPU steal time or blocked I/O (iowait). Measure it, prove it, and either move hosts or pick plans designed to avoid contention.

The misconception

People buy a VPS expecting “dedicated resources.” You might indeed get dedicated RAM and a quota of vCPUs, but CPU cycles, storage queues, and sometimes even NIC bandwidth are still shared. Hypervisors time-slice physical CPUs across many VMs; storage is a shared device or pool with finite queue depth; and noisy neighbors exist.

That’s literally what “multi-tenant cloud” means. The industry even has a name for it: the noisy neighbor effect—another tenant on the same host hogs CPU, disk, or network and your instance suffers.

The real-world symptom: “Load average 5.0+, but nothing’s busy?”

You run top and iotop and see little CPU or disk activity, yet uptime shows load averages like 5.22, 5.10, 5.19. Looks paradoxical—until you remember how Linux computes load average.

Linux load average counts:
– runnable tasks (R), and
– tasks in D state (uninterruptible sleep), commonly waiting on I/O.

So a high load with low %us/%sy isn’t magic—it usually means threads are blocked on I/O (or locks), or you’re starved by the hypervisor. Both are common on shared hosts.

Two usual suspects

1) CPU contention → steal time

When the hypervisor prioritizes other VMs, your vCPU sits ready but cannot run. Tools expose this as steal time (%st). In effect, you asked for CPU but didn’t get it.

Definition. Steal time is the percentage of time your vCPU involuntarily waits because the hypervisor is running someone else. Measurable in top, vmstat, mpstat, iostat.

A rough rule of thumb: if %st sits >10% for ~20 minutes, you are degraded by contention; migrate or change plan.

2) Storage contention → iowait

iowait is time the CPU is idle but waiting for I/O to complete—typical when the storage backend or queue is saturated (again: shared). This pushes threads into D state and does raise load average even if iotop looks quiet at that instant.

Minimal, repeatable troubleshooting (copy/paste)

Install the basics (RHEL/Alma/Rocky):

sudo dnf install --assumeyes sysstat

Snapshot CPU/steal every second:

# iostat: CPU-only view; last column is "steal"
iostat -c 1 30

# Or per-CPU breakdown (from sysstat)
mpstat -P ALL 1 30

# Quick top batch with load and st in one shot
{ uptime; echo; top -bn1 | head -n 20; } 2>&1

Why this works: iostat/mpstat reveal %st (steal). If it’s persistently high, you’re CPU-starved by neighbors. top confirms load and shows %st too.

Check if load is I/O-driven:

# Extended device stats; look for high await/svctm/util
iostat -x 1 10

High %iowait with sluggish app behavior + threads in D state = storage contention. Remember: load can skyrocket purely from tasks blocked on I/O.

Quick “is it us or the host?” sanity list
– %st > ~10% for sustained periods → host CPU contention.
– Many D state threads, rising load, slow responses → storage queue contention.
– top shows low %us/%sy, but uptime high → not CPU-bound, likely I/O or steal.

Case study (summarized)

A VPS showed load ~5 with “little” CPU/IO observed. iostat revealed very high %iowait and some %steal. Migrating the instance to a different physical host immediately normalized load and responsiveness. That’s classic noisy-neighbor relief: same VM, different host, contention gone. (Providers often offer live or cold migrations depending on circumstances.)

What to ask your provider (and what to change yourself)

Ask the provider

Check host contention and migrate you to a quieter node. Some platforms support live migration (minimal disruption); others require a cold migration (shutdown during move).
Move to Dedicated CPU / pinned vCPU plans if your workload is sensitive to latency jitter or sustained CPU. Shared-CPU SKUs expect some steal; dedicated SKUs minimize it.

Things you can change

Decouple I/O from request paths. Queue writes, enable caching layers (e.g., FastCGI cache, Redis page cache, or Varnish in front of application), and ensure gzip/HTTP/3 don’t push blocking work into hot paths.
Right-size and right-type storage. Prefer local NVMe where available. If you must use networked volumes, benchmark and tune (I/O scheduler, queue depths).
Reduce chattiness. Batch DB writes; disable verbose access logs for high-RPS endpoints; use async workers.
Observe, then autoscale (or scale up). If the workload is genuinely CPU-bound, add vCPUs or move tiers to separate nodes so noisy neighbors in one tier don’t block others.

FAQ: common misreads

“iotop says nothing is happening, so it can’t be I/O.”
iotop shows current per-process I/O, not the whole storage fabric’s queueing. Your threads can be blocked waiting for I/O completion even while momentary utilization looks low. Check iostat -x and D-state counts.

“High load means CPU is pegged.”
On Linux, load also counts D state (I/O wait). So, high load + low %us/%sy is perfectly consistent.

“My VPS has 4 vCPUs, so it’s dedicated CPU.”
No. vCPUs are time-slices. Unless you’re on a Dedicated CPU/pinned SKU, steal time can and will happen.

A bite-size playbook you can paste into a ticket

When opening a provider ticket, include evidence:

Symptoms:
- High load averages (paste from `uptime`), sluggish app.

CPU:
- iostat -c 1 30  → paste showing sustained %steal if present
- mpstat -P ALL 1 30 → per-CPU %st/%idle context

I/O:
- iostat -x 1 10 → paste await/util queues
- Count of D-state threads: `ps -eo state= | awk '$1=="D"{c++} END{print c+0}'`

Ask:
- Please check host CPU/storage contention and migrate to a less loaded host.
- If contention persists, advise on switching to Dedicated CPU / different storage tier.

Providers understand %steal, %iowait, and D-state language. You’ll get action faster.

Key takeaways

A VPS is shared hosting with better isolation. Contention still exists.
High load with low CPU usually means I/O wait or steal. Measure both.
Persistent %st or blocked I/O → migrate hosts or move to Dedicated CPU / better storage.
Design your stack (caching, async, batching) to be latency-tolerant.