[..] > > > I personally don't like mem_cgroup_flush_stats_ratelimited() very > > > much, because it is time-based (unlike memcg_vmstats_needs_flush()), > > > and a lot of changes can happen in a very short amount of time. > > > However, it seems like for some workloads it's a necessary evil :/ > > > > > Other than obj_cgroup_may_zswap(), there is no other place which really > need very very accurate stats. IMO we should actually make ratelimited > version the default one for all the places. Stats will always be out of > sync for some time window even with non-ratelimited flush and I don't > see any place where 2 second old stat would be any issue. We disagreed about this before, and I am not trying to get you to debate this with me again :) I just prefer that we avoid this if possible. We have seen cases where the 2 sec window caused issues. Not because 2 sec is a long time, but because userspace reads the stats after an event occurs (e.g. proactive reclaim), but gets stats from before the event. [..] > > > > > > > With a mutex lock contention will be less obvious, as converting this to > > a mutex avoids multiple CPUs spinning while waiting for the lock, but > > it doesn't remove the lock contention. > > > > I don't like global sleepable locks as those are source of priority > inversion issues on highly utilized multi-tenant systems but I still > need to see how you are handling that. For context, this was discussed before as well in [1]. [1]https://lore.kernel.org/lkml/CALvZod441xBoXzhqLWTZ+xnqDOFkHmvrzspr9NAr+nybqXgS-A@xxxxxxxxxxxxxx/