On Mon, Mar 03, 2025 at 07:50:20PM +0000, Yosry Ahmed wrote: > On Mon, Mar 03, 2025 at 10:40:45AM -0800, Shakeel Butt wrote: > > On Mon, Mar 03, 2025 at 06:29:53PM +0000, Yosry Ahmed wrote: > > > On Mon, Mar 03, 2025 at 04:22:42PM +0100, Michal Koutný wrote: > > > > On Thu, Feb 27, 2025 at 01:55:42PM -0800, inwardvessel <inwardvessel@xxxxxxxxx> wrote: > > > > > From: JP Kobryn <inwardvessel@xxxxxxxxx> > > > > ... > > > > > +static inline bool is_base_css(struct cgroup_subsys_state *css) > > > > > +{ > > > > > + return css->ss == NULL; > > > > > +} > > > > > > > > Similar predicate is also used in cgroup.c (various cgroup vs subsys > > > > lifecycle functions, e.g. css_free_rwork_fn()). I think it'd better > > > > unified, i.e. open code the predicate here or use the helper in both > > > > cases (css_is_cgroup() or similar). > > > > > > > > > void __init cgroup_rstat_boot(void) > > > > > { > > > > > - int cpu; > > > > > + struct cgroup_subsys *ss; > > > > > + int cpu, ssid; > > > > > > > > > > - for_each_possible_cpu(cpu) > > > > > - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); > > > > > + for_each_subsys(ss, ssid) { > > > > > + spin_lock_init(&cgroup_rstat_subsys_lock[ssid]); > > > > > + } > > > > > > > > Hm, with this loop I realize it may be worth putting this lock into > > > > struct cgroup_subsys_state and initializing them in > > > > cgroup_init_subsys() to keep all per-subsys data in one pack. > > > > > > I thought about this, but this would have unnecessary memory overhead as > > > we only need one lock per-subsystem. So having a lock in every single > > > css is wasteful. > > > > > > Maybe we can put the lock in struct cgroup_subsys? Then we can still > > > initialize them in cgroup_init_subsys(). > > > > > > > Actually one of things I was thinking about if we can just not have > > per-subsystem lock at all. At the moment, it is protecting > > rstat_flush_next field (today in cgroup and JP's series it is in css). > > What if we make it a per-cpu then we don't need the per-subsystem lock > > all? Let me know if I missed something which is being protected by this > > lock. > > I think it protects more than that. I remember locking into this before, > and the thing I remember is that stats themselves. Looking at > mem_cgroup_stat_aggregate(), we aggregate the stats into the per-cgroup > counters non-atomically. This is only protected by the rstat lock > (currently global, per-subsystem with the series) AFAICT. > > Not sure if only the memory subsystem has this dependency or if others > do as well. I remember looking into switching these counters to atomics > to remove the global lock, but it performed worse IIRC. > > I remember also looking into partioning the lock into a per-cgroup (or > per-css now?) lock, and only holding locks of the parent and child > cgroups as we flush each cgroup. I don't remember if I actually > implemented this, but it introduces complexity. > > Perhaps we can defer the locking into the subsystem, if only the memory > controller requires it. In this case mem_cgroup_css_rstat_flush() (or > mem_cgroup_stat_aggregate()) can hold a memcg-specific spinlock only > while aggregating the stats. > > There could be other things protected by the lock, but that's what I > remember. Thanks for reminding me. It does not seem straight forward or simple. Lower priority then.