Frequently Asked Question

What does the load average actually mean, and why is it so often misleading?

The three numbers from uptime are the exponentially-weighted moving averages of the number of tasks in the run queue or in uninterruptible sleep (state D) over the past 1, 5, and 15 minutes. On a single-CPU machine, 1.0 means the CPU was constantly busy with one runnable task on average. On an eight-core machine, anything up to 8.0 just means full utilisation; only sustained values above the core count indicate contention. The 15-minute number, lagging behind the others, tells you whether you are trending up or recovering.

It is misleading because it conflates CPU demand with I/O demand. A server stuck waiting on a frozen NFS mount can show a huge load while the CPUs sit idle, and a CPU-pinned scientific job at full tilt can show a perfectly modest 1.0. Brendan Gregg's investigation of this in 2017 traced the uninterruptible-sleep behaviour back to a 1993 patch from Matthias Urlichs. Treat the load average as a thirty-second triage signal, then move quickly to top, vmstat, and iostat to find out what the queueing actually consists of.

Further reading and video