On Tue, Apr 4, 2023 at 9:53 AM Michal Koutný <mkoutny@xxxxxxxx> wrote: > > Hello. > > On Thu, Mar 30, 2023 at 07:17:57PM +0000, Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > static void __mem_cgroup_flush_stats(void) > > { > > - unsigned long flag; > > - > > - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) > > + /* > > + * We always flush the entire tree, so concurrent flushers can just > > + * skip. This avoids a thundering herd problem on the rstat global lock > > + * from memcg flushers (e.g. reclaim, refault, etc). > > + */ > > + if (atomic_read(&stats_flush_ongoing) || > > + atomic_xchg(&stats_flush_ongoing, 1)) > > return; > > I'm curious about why this instead of > > if (atomic_xchg(&stats_flush_ongoing, 1)) > return; > > Is that some microarchitectural cleverness? > Yes indeed it is. Basically we want to avoid unconditional cache dirtying. This pattern is also used at other places in the kernel like qspinlock.