Hello, Peter. On Tue, Aug 01, 2017 at 11:40:38PM +0200, Peter Zijlstra wrote: > > * On cgroup2, there is only one hierarchy. It'd be great to have > > basic resource accounting enabled by default on all cgroups. Note > > that we couldn't do that on v1 because there could be any number of > > hierarchies and the cost would increase with the number of > > hierarchies. > > Yes, the whole single hierarchy thing makes doing away with the double > accounting possible. Yeah, we can either do that or make it cheaper so that we can have basic stats by default. > > * It is bothersome that we're walking up the tree each time for > > cpuacct although being percpu && just walking up the tree makes it > > relatively cheap. > > So even if its only CPU local accounting, you still have all the pointer > chasing and misses, not to mention that a faster O(depth) is still > O(depth). > > > Anyways, I'm thinking about shifting the > > aggregation to the reader side so that the hot path always only > > updates local counters in a way which can scale even when there are > > a lot of (idle) cgroups. Will follow up on this later. > > Not entirely sure I follow, we currently only update the current cgroup > and its immediate parents, no? Or are you looking to only account into > the current cgroup and propagate into the parents on reading? Yeah, shifting the cost to the readers and being smart with propagation so that reading isn't O(nr_descendants) but O(nr_descendants_which_have_run_since_last_read). That way, we can show the basic stats without taxing the hot paths with reasonable scalability. I have a couple questions about cpuacct tho. * The stat file is sampling based and the usage files are calculated from actual scheduling events. Is this because the latter is more accurate? * Why do we have user/sys breakdown in usage numbers? It tries to distinguish user or sys by looking at task_pt_regs(). I can't see how this would work (e.g. interrupt handlers never schedule) and w/o kernel preemption, the sys part is always zero. What is this number supposed to mean? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html