On Fri, Feb 28, 2025 at 05:06:23PM -0800, JP Kobryn wrote: [..] > > > > > cgroup_idr_replace(&ss->css_idr, NULL, css->id); > > > if (ss->css_released) > > [..] > > > @@ -6188,6 +6186,9 @@ int __init cgroup_init(void) > > > css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, > > > GFP_KERNEL); > > > BUG_ON(css->id < 0); > > > + > > > + if (css->ss && css->ss->css_rstat_flush) > > > + BUG_ON(cgroup_rstat_init(css)); > > > > Why do we need this call here? We already call cgroup_rstat_init() in > > cgroup_init_subsys(). IIUC for subsystems with ss->early_init, we will > > have already called cgroup_init_subsys() in cgroup_init_early(). > > > > Did I miss something? > > Hmmm it's a good question. cgroup_rstat_init() is deferred in the same > way that cgroup_idr_alloc() is. So for ss with early_init == true, > cgroup_rstat_init() is not called during cgroup_early_init(). Oh I didn't realize that the call here is only when early_init == false. I think we need a comment to clarify that cgroup_idr_alloc() and cgroup_rstat_init() are not called in cgroup_init_subsys() when early_init == true, and hence need to be called in cgroup_init(). Or maybe organize the code in a way to make this more obvious (put them in a helper with a descriptive name? idk). > > Is it safe to call alloc_percpu() during early boot? If so, the > deferral can be removed and cgroup_rstat_init() can be called in one > place. I don't think so. cgroup_init_early() is called before setup_per_cpu_areas(). > > > > > > } else { > > > cgroup_init_subsys(ss, false); > > > } > > [..] > > > @@ -300,27 +306,25 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) > > > } > > > /* see cgroup_rstat_flush() */ > > > -static void cgroup_rstat_flush_locked(struct cgroup *cgrp) > > > +static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css) > > > __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) > > > { > > > + struct cgroup *cgrp = css->cgroup; > > > int cpu; > > > lockdep_assert_held(&cgroup_rstat_lock); > > > for_each_possible_cpu(cpu) { > > > - struct cgroup *pos = cgroup_rstat_updated_list(cgrp, cpu); > > > + struct cgroup_subsys_state *pos; > > > + pos = cgroup_rstat_updated_list(css, cpu); > > > for (; pos; pos = pos->rstat_flush_next) { > > > - struct cgroup_subsys_state *css; > > > + if (!pos->ss) > > > + cgroup_base_stat_flush(pos->cgroup, cpu); > > > + else > > > + pos->ss->css_rstat_flush(pos, cpu); > > > - cgroup_base_stat_flush(pos, cpu); > > > - bpf_rstat_flush(pos, cgroup_parent(pos), cpu); > > > - > > > - rcu_read_lock(); > > > - list_for_each_entry_rcu(css, &pos->rstat_css_list, > > > - rstat_css_node) > > > - css->ss->css_rstat_flush(css, cpu); > > > - rcu_read_unlock(); > > > + bpf_rstat_flush(pos->cgroup, cgroup_parent(pos->cgroup), cpu); > > > > We should call bpf_rstat_flush() only if (!pos->ss) as well, right? > > Otherwise we will call BPF rstat flush whenever any subsystem is > > flushed. > > > > I guess it's because BPF can now pass any subsystem to > > cgroup_rstat_flush(), and we don't keep track. I think it would be > > better if we do not allow BPF programs to select a css and always make > > them flush the self css. > > > > We can perhaps introduce a bpf_cgroup_rstat_flush() wrapper that takes > > in a cgroup and passes cgroup->self internally to cgroup_rstat_flush(). > > I'm fine with this if others are in agreement. A similar concept was > done in v1. Let's wait for Shakeel to chime in here since he suggested removing this hook, but I am not sure if he intended to actually do it or not. Better not to waste effort if this will be gone soon anyway. > > > > > But if the plan is to remove the bpf_rstat_flush() call here soon then > > it's probably not worth the hassle. > > > > Shakeel (and others), WDYT? >