Hello. On Mon, Feb 19, 2024 at 04:41:34PM +0800, Cruz Zhao <CruzZhao@xxxxxxxxxxxxxxxxx> wrote: > As core sched uses rq_clock() as clock source to account forceidle > time, irq time will be accounted into forceidle time. However, in > some scenarios, forceidle sum will be much larger than exec runtime, > e.g., we observed that forceidle time of task calling futex_wake() > is 50% larger than exec runtime, which is confusing. And those 50% turned out to be all attributed to irq time (that's suggested by your diagram)? (Could you argue about that time with data from /proc/stat alone?) > Interfaces: > - task level: /proc/$pid/sched, row core_forceidle_task_sum. > - cgroup level: /sys/fs/cgroup/$cg/cpu.stat, row > core_sched.force_idle_task_usec. Hm, when you touch this, could you please also add a section into Documentation/admin-guide/cgroup-v2.rst about these entries? (Alternatively, explain in the commit message why those aren't supposed to be documented. Alternative altenratively, would mere documenting of core_sched.force_idle_usec help to prevent the confusion that you called out above?) Also, I wonder if the rstat counting code shouldn't be hidden with CONFIG_SCHED_DEBUG too? (IIUC, that's the same one required to see analogous stats in /proc/$pid/sched.) Regards, Michal
Attachment:
signature.asc
Description: PGP signature