On Tue, Apr 4, 2023 at 10:13 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Tue, Apr 4, 2023 at 9:53 AM Michal Koutný <mkoutny@xxxxxxxx> wrote: > > > > Hello. > > > > On Thu, Mar 30, 2023 at 07:17:57PM +0000, Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > static void __mem_cgroup_flush_stats(void) > > > { > > > - unsigned long flag; > > > - > > > - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) > > > + /* > > > + * We always flush the entire tree, so concurrent flushers can just > > > + * skip. This avoids a thundering herd problem on the rstat global lock > > > + * from memcg flushers (e.g. reclaim, refault, etc). > > > + */ > > > + if (atomic_read(&stats_flush_ongoing) || > > > + atomic_xchg(&stats_flush_ongoing, 1)) > > > return; > > > > I'm curious about why this instead of > > > > if (atomic_xchg(&stats_flush_ongoing, 1)) > > return; > > > > Is that some microarchitectural cleverness? > > > > Yes indeed it is. Basically we want to avoid unconditional cache > dirtying. This pattern is also used at other places in the kernel like > qspinlock. Oh also take a look at https://lore.kernel.org/all/20230404052228.15788-1-feng.tang@xxxxxxxxx/