unexpected CPU pressure measurements when applying cpu.max control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

We've observed some unexpected CPU pressure measurements via the /proc/pressure/cpu interface when applying the cpu.max control within a cgroup.

In short, processes executing within a CPU-limited cgroup are contributing to the system-wide CPU pressure measurement. This results in misleading data that points toward system CPU contention, when no system-wide contention exists.


For example: we create a cgroup limited to a single CPU (cpu.max = '100000 100000') and within that cgroup we launch 10 processes vying over that CPU.

I'm using systemd-run in the command below for convenience sake, where the CPUQuota property sets the underlying cpu.max cgroup control.

The command that launches the 10 processes is `stress --cpu 10`.

[fitzy@~]$ uname -r
6.8.9-300.fc40.x86_64

Execute the process:

[fitzy@~]$ sudo systemd-run --property CPUQuota=100% --slice example stress --cpu 10 Running as unit: run-rf1c808a9ce1d4e7c82cc57ab90e728e3.service; invocation ID: 67b0808e72364325940cfa898231e83e

Observe the cgroup-specific CPU pressure measurement:

[fitzy@~]$ cat /sys/fs/cgroup/example.slice/run-rf1c808a9ce1d4e7c82cc57ab90e728e3.service/cpu.pressure
some avg10=87.32 avg60=86.44 avg300=56.96 total=272053462
full avg10=87.32 avg60=86.44 avg300=56.96 total=272053075

Compare to the system.slice CPU pressure measurement:

[fitzy@~]$ cat /sys/fs/cgroup/system.slice/cpu.pressure
some avg10=0.00 avg60=0.00 avg300=1.89 total=333141519
full avg10=0.00 avg60=0.00 avg300=1.89 total=332415623

Compare to the system-wide CPU pressure measurement:

[fitzy@~]$ cat /proc/pressure/cpu
some avg10=85.37 avg60=84.94 avg300=65.05 total=1655875251
full avg10=0.00 avg60=0.00 avg300=0.00 total=0


I've compared these tests on a 5.10.0 system as well as 6.8.9 (above).

There are two differences I can see:

- On 5.10 the 'full' line is not present in either the cgroup cpu.pressure interface or the kernel /proc/pressure/cpu interface. I'm assuming this was added in a newer kernel at some point.

- On 6.8.9 the 'full' line in the cgroup cpu.pressure interface appears to provide accurate data based on this simple test.

As we know, the kernel 'full' measurement is undefined.


In either case, the kernel PSI interface is the canonical source from which we want to read the measurements for warning us of CPU contention on our fleet of machines. Due to this unexpected accounting, the values may be misleading.

Frankly, I'm not sure of what the behaviour should be. I can see the argument that the current value is correct, given the definition is 'some' tasks are waiting on CPU.

However we have no data to fall back on - we cannot use the 'full' measurement from the kernel for CPU pressure. Unless we segregate all CPU-limited processes into their own cgroup slice and read distinct measurements from there, we also cannot rely on reading the cgroup(s) cpu.pressure interface.

For now, we are preferring the use of CPU weight controls - which only come into effect at saturation points - as a compromise. This isn't always the preferred control, because we sometimes want to place a hard cap on cpu-hungry but low-prio processes (e.g. log transformation services).

Does anyone have advice, or can comment on what the expected behaviour is under these circumstances? Perhaps this is simply WAI, and we need to make concessions higher up in the stack.

fitzy

---

Michael Fitz-Payne
System Administrator
Civilized Discourse Construction Kit, Inc.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux