I'm sure every system engineer knows of the load average. Let's talk on why it might be misguided to rely on it being the sole performance metric and how to find better metrics.

Stop obsessing over load averages

There is a lot of obsession in the industry about thinking that a high load average automatically means that the server is overloaded. Often, this assumption is incorrect.

From the Linux Kernel source code:

/*
 * kernel/sched/loadavg.c
 *
 * This file contains the magic bits required to compute the global loadavg
 * figure. Its a silly number but people think its important. We go through
 * great pains to make it work on big machines and tickless kernels.
 */

https://github.com/torvalds/linux/blob/master/kernel/sched/loadavg.c

What the load average isn't

Let's start off by figuring out what the load average isn't: the load average is not a metric that displays CPU utilization, and it is not a metric that displays how many cores of your processor are busy. It is not a metric that clearly indicates your CPU being overloaded.

What the load average is

From the Linux Kernel source code:

 * The global load average is an exponentially decaying average of nr_running +
 * nr_uninterruptible.

https://github.com/torvalds/linux/blob/master/kernel/sched/loadavg.c

Or, in case you'd find this explanation simpler: The load average is an exponentially decaying average of the number of jobs that are running (R state) and in uninterruptible sleep (D state).

R & D (states)

The R (running) state is a flag that indicates the process is currently running or is runnable (in the run queue). This is a pretty self-explanatory state in the sense that if it is running code it will likely show up with the R flag.

D on the other hand, or "Uninterruptible Sleep" is a bit less simple. Usually, you'd see a process in this state when it is waiting on disk or network disk I/O. Processes in this state cannot be immediately terminated through signals, as they do not process signals until the wait is complete.

Before we decide how useful this metric is, let's go over CPU utilization.

Measuring CPU utilization

There are many tools available out of the box that help you measure CPU usage. One example is vmstat.

# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0   1536 10508580 1369596 16251624    0    0     0     2    0    0  1  0 99  0  0
 0  0   1536 10508580 1369596 16251628    0    0     0     0 1523 2457  0  0 100  0  0
 0  0   1536 10508572 1369596 16251640    0    0     0    23 2456 4252  0  0 100  0  0
 0  0   1536 10508696 1369596 16251648    0    0     0   459 2130 3418  0  0 100  0  0
 0  0   1536 10508572 1369596 16251660    0    0     0     0 2568 4113  0  0 99  0  0

vmstat 1 5 - iterate every 1 second, up to 5 times

In this post we're going to go over the last five columns, which represent a percentage of time spent in each of these modes:

columnmodesummary
ususerCode executed in user space
sysystemKernel space: the Linux Kernel, drivers and system calls, context
ididle"Twiddling thumbs", or not doing anything
waiowaitTime spent waiting on I/O (disk, network disk...)
ststealTime the system wanted to spend doing things, but was prevented from by the hypervisor (typical in virtualization)

Above values actually come from /proc/stat

There are other tools such as top and htop, but they more or less do the same thing. Vmstat was chosen here for simplicity (it's also fast, as it does not need to look up processes). The first line of vmstat is not an accurate representation of current CPU usage, as it displays CPU time since boot:

The first report produced gives averages since the last reboot. Additional reports give information on a sampling period of length delay. The process and memory reports are instantaneous in either case

vmstat manual

Calculating CPU utilization

The calculation is rather simple, add idle and iowait together, then subtract this number from 100. The result is a number representing CPU utilization. The reason we're adding iowait to idle is that during iowait the CPU will do other things. Therefore, we can assume that iowait is not a metric representing utilization.

Real-world example and the road to better monitoring

Here's an example for a busy MySQL server. 12 physical CPU cores, 24 logical cores. Lots of RAM, lots of traffic, 500+GB MySQL database.

In the above graphs you can see the CPU usage, IOwait, load average and number of running jobs. You will notice that the load average can get rather high (larger than number of CPU cores), but CPU utilization does not get critical. This is fine. The server is not overloaded, even though the load average gets large.

The load average will sometimes go hand in hand with increased CPU utilization, but alone it is not a metric that computes CPU utilization.

The metrics above are displayed in Grafana, from Prometheus through the Monitoring Exporter.

Monitoring

I have a strong belief that it is worth to gather both load average and cpu utilization, but for different reasons. The load average will tell you more or less how many jobs are running and waiting on I/O, while the CPU utilization metrics will tell you how stressed the processor is. When alerting for processor usage, try not using the load average.

There are many monitoring options available, and all of them have the capability to provide you with both of these metrics. I personally love Prometheus and the small Monitoring Exporter, but the choice is up to you.