<snip> > > > [...] > > > > @@ -639,17 +639,24 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) > > > > } > > > > } > > > > > > > > -static void do_flush_stats(void) > > > > +static void do_flush_stats(bool full) > > > > { > > > > + if (!atomic_read(&stats_flush_ongoing) && > > > > + !atomic_xchg(&stats_flush_ongoing, 1)) > > > > + goto flush; > > > > + > > > > /* > > > > - * We always flush the entire tree, so concurrent flushers can just > > > > - * skip. This avoids a thundering herd problem on the rstat global lock > > > > - * from memcg flushers (e.g. reclaim, refault, etc). > > > > + * We always flush the entire tree, so concurrent flushers can choose to > > > > + * skip if accuracy is not critical. Otherwise, wait for the ongoing > > > > + * flush to complete. This avoids a thundering herd problem on the rstat > > > > + * global lock from memcg flushers (e.g. reclaim, refault, etc). > > > > */ > > > > - if (atomic_read(&stats_flush_ongoing) || > > > > - atomic_xchg(&stats_flush_ongoing, 1)) > > > > - return; > > > > - > > > > + while (full && atomic_read(&stats_flush_ongoing) == 1) { > > > > + if (!cond_resched()) > > > > + cpu_relax(); > > > > > > You are reinveting a mutex with spinning waiter. Why don't you simply > > > make stats_flush_ongoing a real mutex and make use try_lock for !full > > > flush and normal lock otherwise? > > > > So that was actually a spinlock at one point, when we used to skip if > > try_lock failed. > > AFAICS cgroup_rstat_flush is allowed to sleep so spinlocks are not > really possible. Sorry I hit the send button too early, didn't get to this part. We were able to use a spinlock because we used to disable sleeping when flushing the stats then, which opened another can of worms :) > > > We opted for an atomic because the lock was only used > > in a try_lock fashion. The problem here is that the atomic is used to > > ensure that only one thread actually attempts to flush at a time (and > > others skip/wait), to avoid a thundering herd problem on > > cgroup_rstat_lock. > > > > Here, what I am trying to do is essentially equivalent to "wait until > > the lock is available but don't grab it". If we make > > stats_flush_ongoing a mutex, I am afraid the thundering herd problem > > will be reintroduced for stats_flush_ongoing this time. > > You will have potentially many spinners for something that might take > quite a lot of time (sleep) if there is nothing else to schedule. I do > not think this is a proper behavior. Really, you shouldn't be busy > waiting for a sleeper. > > > I am not sure if there's a cleaner way of doing this, but I am > > certainly open for suggestions. I also don't like how the spinning > > loop looks as of now. > > mutex_try_lock for non-critical flushers and mutex_lock of syncing ones. > We can talk a custom locking scheme if that proves insufficient or > problematic. I have no problem with this. I can send a v2 following this scheme, once we agree on the importance of this patch :) > -- > Michal Hocko > SUSE Labs