So I mostly like. On accounting it only adds to the immediate cgroup (if it has a parent, aka !root). On update it does a DFS of all sub-groups and propagates the deltas up to the requested group. On Fri, Aug 11, 2017 at 09:37:54AM -0700, Tejun Heo wrote: > +/** > + * cgroup_cpu_stat_updated - keep track of updated cpu_stat > + * @cgrp: target cgroup > + * @cpu: cpu on which cpu_stat was updated > + * > + * @cgrp's cpu_stat on @cpu was updated. Put it on the parent's matching > + * cpu_stat->updated_children list. > + */ > +static void cgroup_cpu_stat_updated(struct cgroup *cgrp, int cpu) > +{ > + raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_cpu_stat_lock, cpu); > + struct cgroup *parent; > + unsigned long flags; > + > + /* > + * Speculative already-on-list test. This may race leading to > + * temporary inaccuracies, which is fine. > + * > + * Because @parent's updated_children is terminated with @parent > + * instead of NULL, we can tell whether @cgrp is on the list by > + * testing the next pointer for NULL. > + */ > + if (cgroup_cpu_stat(cgrp, cpu)->updated_next) > + return; > + > + raw_spin_lock_irqsave(cpu_lock, flags); > + > + /* put @cgrp and all ancestors on the corresponding updated lists */ > + for (parent = cgroup_parent(cgrp); parent; > + cgrp = parent, parent = cgroup_parent(cgrp)) { > + struct cgroup_cpu_stat *cstat = cgroup_cpu_stat(cgrp, cpu); > + struct cgroup_cpu_stat *pcstat = cgroup_cpu_stat(parent, cpu); > + > + /* > + * Both additions and removals are bottom-up. If a cgroup > + * is already in the tree, all ancestors are. > + */ > + if (cstat->updated_next) > + break; > + > + cstat->updated_next = pcstat->updated_children; > + pcstat->updated_children = cgrp; > + } > + > + raw_spin_unlock_irqrestore(cpu_lock, flags); > +} > +static struct cgroup_cpu_stat *cgroup_cpu_stat_account_begin(struct cgroup *cgrp) > +{ > + struct cgroup_cpu_stat *cstat; > + > + cstat = get_cpu_ptr(cgrp->cpu_stat); > + u64_stats_update_begin(&cstat->sync); > + return cstat; > +} > + > +static void cgroup_cpu_stat_account_end(struct cgroup *cgrp, > + struct cgroup_cpu_stat *cstat) > +{ > + u64_stats_update_end(&cstat->sync); > + cgroup_cpu_stat_updated(cgrp, smp_processor_id()); > + put_cpu_ptr(cstat); > +} > + > +void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) > +{ > + struct cgroup_cpu_stat *cstat; > + > + cstat = cgroup_cpu_stat_account_begin(cgrp); > + cstat->cputime.sum_exec_runtime += delta_exec; > + cgroup_cpu_stat_account_end(cgrp, cstat); > +} What I don't get is why you need cgroup_cpu_stat_updated(). That is, I see you use it to keep the keep the DFS 'stack' up-to-date, but what I don't see is why you'd need that. Have a look at walk_tg_tree_from(), I think we can do something like that on struct cgroup_subsys_state, it has that children list and the parent pointer. And yes, walk_tg_tree_from() is tricky, it always takes a fair while to remember how it works. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html