Hello, Peter. On Tue, Aug 29, 2017 at 04:32:52PM +0200, Peter Zijlstra wrote: > So I mostly like. On accounting it only adds to the immediate cgroup (if > it has a parent, aka !root). > > On update it does a DFS of all sub-groups and propagates the deltas up > to the requested group. ... > What I don't get is why you need cgroup_cpu_stat_updated(). That is, I > see you use it to keep the keep the DFS 'stack' up-to-date, but what I > don't see is why you'd need that. That is to make reading stats O(number of descendants which have been active since last read) instad of O(number of all descendants) as there can be a lot of not-too-active cgroups in a system. Stat reading can be frequent, so the combination can get really bad. By keeping the updated list separate, increasing read frequency decreases the cost of each read. Also, please note that a system may end up with a lot of cgroups without the user intending to. memcg drains removed cgroups lazily and the number of draining cgroups can reach very high numbers if the system isn't under memory pressure. The plan is to add basic stats for other resources too and keeping it scalable w.r.t. idle cgroups allows using the same mechanism for all resources. > Have a look at walk_tg_tree_from(), I think we can do something like > that on struct cgroup_subsys_state, it has that children list and the > parent pointer. > > And yes, walk_tg_tree_from() is tricky, it always takes a fair while to > remember how it works. We can propagate "updated" flag up the tree (we need to, otherwise we can't tell which subtree to descend into) and prune the iteration on subtrees which haven't been updated; however, this can still become very costly depending on the topology as it can't jump over the siblings which haven't been updated. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html