On Thu, Jan 6, 2022 at 4:10 AM cruzzhao <cruzzhao@xxxxxxxxxxxxxxxxx> wrote: > > > > > That motivation makes more sense to me. Have you considered > > accumulating this at the cgroup level (ie. attributing it as another > > type of usage)? > > I've already read the patch "sched: CGroup tagging interface for core > scheduling", but it hasn't been merged into linux-next. IMO it's better > to do this at the cgroup level after the cgroup tagging interface is > introduced. > > Best, > Cruz Zhao There are no plans to introduce cgroup-level tagging for core sched. But the accounting is a separate issue. Similar to how tasks account usage both to themselves and to their cgroup hierarchy, we could account forced idle in a similar way, and add another field to cpu_extra_stat_show. That still gives you the total system forced idle time by looking at the root cgroup, and allows you to slice the accounting by different job groups. It also makes the accounting a single value per cgroup rather than a per-cpu value (I still don't see the value of attributing to specific cpus, as I described in my prior reply).