Re: [PATCH 1/4 v2] cgroup: move cgroup_rstat from cgroup to cgroup_subsys_state

Yosry Ahmed <yosry.ahmed@xxxxxxxxx> · Sat, 1 Mar 2025 01:25:03 +0000

On Fri, Feb 28, 2025 at 05:06:23PM -0800, JP Kobryn wrote:
[..] 
> > 
> > >   		cgroup_idr_replace(&ss->css_idr, NULL, css->id);
> > >   		if (ss->css_released)
> > [..]
> > > @@ -6188,6 +6186,9 @@ int __init cgroup_init(void)
> > >   			css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
> > >   						   GFP_KERNEL);
> > >   			BUG_ON(css->id < 0);
> > > +
> > > +			if (css->ss && css->ss->css_rstat_flush)
> > > +				BUG_ON(cgroup_rstat_init(css));
> > 
> > Why do we need this call here? We already call cgroup_rstat_init() in
> > cgroup_init_subsys(). IIUC for subsystems with ss->early_init, we will
> > have already called cgroup_init_subsys() in cgroup_init_early().
> > 
> > Did I miss something?
> 
> Hmmm it's a good question. cgroup_rstat_init() is deferred in the same
> way that cgroup_idr_alloc() is. So for ss with early_init == true,
> cgroup_rstat_init() is not called during cgroup_early_init().

Oh I didn't realize that the call here is only when early_init == false.
I think we need a comment to clarify that cgroup_idr_alloc() and
cgroup_rstat_init() are not called in cgroup_init_subsys() when
early_init == true, and hence need to be called in cgroup_init().

Or maybe organize the code in a way to make this more obvious (put them
in a helper with a descriptive name? idk).

> 
> Is it safe to call alloc_percpu() during early boot? If so, the
> deferral can be removed and cgroup_rstat_init() can be called in one
> place.

I don't think so. cgroup_init_early() is called before
setup_per_cpu_areas().

> 
> > 
> > >   		} else {
> > >   			cgroup_init_subsys(ss, false);
> > >   		}
> > [..]
> > > @@ -300,27 +306,25 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop)
> > >   }
> > >   /* see cgroup_rstat_flush() */
> > > -static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
> > > +static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css)
> > >   	__releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
> > >   {
> > > +	struct cgroup *cgrp = css->cgroup;
> > >   	int cpu;
> > >   	lockdep_assert_held(&cgroup_rstat_lock);
> > >   	for_each_possible_cpu(cpu) {
> > > -		struct cgroup *pos = cgroup_rstat_updated_list(cgrp, cpu);
> > > +		struct cgroup_subsys_state *pos;
> > > +		pos = cgroup_rstat_updated_list(css, cpu);
> > >   		for (; pos; pos = pos->rstat_flush_next) {
> > > -			struct cgroup_subsys_state *css;
> > > +			if (!pos->ss)
> > > +				cgroup_base_stat_flush(pos->cgroup, cpu);
> > > +			else
> > > +				pos->ss->css_rstat_flush(pos, cpu);
> > > -			cgroup_base_stat_flush(pos, cpu);
> > > -			bpf_rstat_flush(pos, cgroup_parent(pos), cpu);
> > > -
> > > -			rcu_read_lock();
> > > -			list_for_each_entry_rcu(css, &pos->rstat_css_list,
> > > -						rstat_css_node)
> > > -				css->ss->css_rstat_flush(css, cpu);
> > > -			rcu_read_unlock();
> > > +			bpf_rstat_flush(pos->cgroup, cgroup_parent(pos->cgroup), cpu);
> > 
> > We should call bpf_rstat_flush() only if (!pos->ss) as well, right?
> > Otherwise we will call BPF rstat flush whenever any subsystem is
> > flushed.
> > 
> > I guess it's because BPF can now pass any subsystem to
> > cgroup_rstat_flush(), and we don't keep track. I think it would be
> > better if we do not allow BPF programs to select a css and always make
> > them flush the self css.
> > 
> > We can perhaps introduce a bpf_cgroup_rstat_flush() wrapper that takes
> > in a cgroup and passes cgroup->self internally to cgroup_rstat_flush().
> 
> I'm fine with this if others are in agreement. A similar concept was
> done in v1.

Let's wait for Shakeel to chime in here since he suggested removing this
hook, but I am not sure if he intended to actually do it or not. Better
not to waste effort if this will be gone soon anyway.

> 
> > 
> > But if the plan is to remove the bpf_rstat_flush() call here soon then
> > it's probably not worth the hassle.
> > 
> > Shakeel (and others), WDYT?
>