Hello Yafang. On Fri, Nov 08, 2024 at 09:29:00PM GMT, Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > After enabling CONFIG_IRQ_TIME_ACCOUNTING to track IRQ pressure in our > container environment, we encountered several user-visible behavioral > changes: > > - Interrupted IRQ/softirq time is excluded in the cpuacct cgroup > > This breaks userspace applications that rely on CPU usage data from > cgroups to monitor CPU pressure. This patchset resolves the issue by > ensuring that IRQ/softirq time is included in the cgroup of the > interrupted tasks. > > - getrusage(2) does not include time interrupted by IRQ/softirq > > Some services use getrusage(2) to check if workloads are experiencing CPU > pressure. Since IRQ/softirq time is no longer included in task runtime, > getrusage(2) can no longer reflect the CPU pressure caused by heavy > interrupts. I understand that IRQ/softirq time is difficult to attribute to an "accountable" entity and it's technically simplest to attribute it everyone/noone, i.e. to root cgroup (or through a global stat w/out cgroups). > This patchset addresses the first issue, which is relatively > straightforward. Once this solution is accepted, I will address the second > issue in a follow-up patchset. Is the first issue about cpuacct data or irq.pressure? It sounds kind of both and I noticed the docs for irq.pressure is lacking in Documentation/accounting/psi.rst. When you're touching this, could you please add a paragraph or sentence explaining what does this value represent? (Also, there is same change both for cpuacct and cgroup_base_stat_cputime_show(), right?) > ---------------- > | Load Balancer| > ---------------- > / | | \ > / | | \ > Server1 Server2 Server3 ... ServerN > > Although the load balancer's algorithm is complex, it follows some core > principles: > > - When server CPU utilization increases, it adds more servers and deploys > additional instances to meet SLA requirements. > - When server CPU utilization decreases, it scales down by decommissioning > servers and reducing the number of instances to save on costs. A server here references to a whole node (whole kernel) or to a cgroup (i.e. more servers on top of one kernel)? > The load balancer is malfunctioning due to the exclusion of IRQ time from > CPU utilization calculations. Could this be fixed by subtracting (global) IRQ time from (presumed total) system capacity that the balancer uses for its decisions? (i.e. without exact per-cgroup breakdown of IRQ time) Thanks, Michal
Attachment:
signature.asc
Description: PGP signature