Performance and Observability: What does the load average actually mean, and why is it so often misleading?

Dr Chris Paton

Frequently Asked Question

What does the load average actually mean, and why is it so often misleading?

The three numbers from uptime are the exponentially-weighted moving averages of the number of tasks in the run queue or in uninterruptible sleep (state D) over the past 1, 5, and 15 minutes. On a single-CPU machine, 1.0 means the CPU was constantly busy with one runnable task on average. On an eight-core machine, anything up to 8.0 just means full utilisation; only sustained values above the core count indicate contention. The 15-minute number, lagging behind the others, tells you whether you are trending up or recovering.

It is misleading because it conflates CPU demand with I/O demand. A server stuck waiting on a frozen NFS mount can show a huge load while the CPUs sit idle, and a CPU-pinned scientific job at full tilt can show a perfectly modest 1.0. Brendan Gregg's investigation of this in 2017 traced the uninterruptible-sleep behaviour back to a 1993 patch from Matthias Urlichs. Treat the load average as a thirty-second triage signal, then move quickly to top, vmstat, and iostat to find out what the queueing actually consists of.

What does the load average actually mean, and why is it so often misleading?

Further reading and video