On Wed, Mar 22, 2023 at 9:29 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Wed, Mar 22, 2023 at 9:00 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > > Currently, when sleeping is not allowed during rstat flushing, we hold > > the global rstat lock with interrupts disabled throughout the entire > > flush operation. Flushing in an O(# cgroups * # cpus) operation, and > > having interrupts disabled throughout is dangerous. > > > > For some contexts, we may not want to sleep, but can be interrupted > > (e.g. while holding a spinlock or RCU read lock). As such, do not > > disable interrupts throughout rstat flushing, only when holding the > > percpu lock. This breaks down the O(# cgroups * # cpus) duration with > > interrupts disabled to a series of O(# cgroups) durations. > > > > Furthermore, if a cpu spinning waiting for the global rstat lock, it > > doesn't need to spin with interrupts disabled anymore. > > > > If the caller of rstat flushing needs interrupts to be disabled, it's up > > to them to decide that, and it should be fine to hold the global rstat > > lock with interrupts disabled. There is currently a single context that > > may invoke rstat flushing with interrupts disabled, the > > mem_cgroup_flush_stats() call in mem_cgroup_usage(), if called from > > mem_cgroup_threshold(). > > > > To make it safe to hold the global rstat lock with interrupts enabled, > > make sure we only flush from in_task() contexts. The side effect of that > > we read stale stats in interrupt context, but this should be okay, as > > flushing in interrupt context is dangerous anyway as it is an expensive > > operation, so reading stale stats is safer. > > > > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > Couple of questions: > > 1. What exactly is cgroup_rstat_lock protecting? Can we just remove it > altogether? I believe it protects the global state variables that we flush into. For example, for memcg, it protects mem_cgroup->vmstats. I tried removing the lock and allowing concurrent flushing on different cpus, by changing mem_cgroup->vmstats to use atomics instead, but that turned out to be a little expensive. Also, cgroup_rstat_lock is already contended by different flushers (mitigated by stats_flush_lock on the memcg side). If we remove it, concurrent flushers contend on every single percpu lock instead, which also seems to be expensive. > 2. Are we really calling rstat flush in irq context? I think it is possible through the charge/uncharge path: memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usage(). I added the protection against flushing in an interrupt context for future callers as well, as it may cause a deadlock if we don't disable interrupts when acquiring cgroup_rstat_lock. > 3. The mem_cgroup_flush_stats() call in mem_cgroup_usage() is only > done for root memcg. Why is mem_cgroup_threshold() interested in root > memcg usage? Why not ignore root memcg in mem_cgroup_threshold() ? I am not sure, but the code looks like event notifications may be set up on root memcg, which is why we need to check thresholds. Even if mem_cgroup_threshold() does not flush memcg stats, the purpose of this patch is to make sure the rstat flushing code itself is not disabling interrupts; which it currently does for any unsleepable context, even if it is interruptible.