On Mon, Jun 24, 2024 at 1:18 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > On Mon, Jun 24, 2024 at 12:37:30PM GMT, Yosry Ahmed wrote: > > On Mon, Jun 24, 2024 at 12:29 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > > > > > On Mon, Jun 24, 2024 at 10:40:48AM GMT, Yosry Ahmed wrote: > > > > On Mon, Jun 24, 2024 at 10:32 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > > > > > > > > > On Mon, Jun 24, 2024 at 05:46:05AM GMT, Yosry Ahmed wrote: > > > > > > On Mon, Jun 24, 2024 at 4:55 AM Jesper Dangaard Brouer <hawk@xxxxxxxxxx> wrote: > > > > > > > > > > > [...] > > > > > > I am assuming this supersedes your other patch titled "[PATCH RFC] > > > > > > cgroup/rstat: avoid thundering herd problem on root cgrp", so I will > > > > > > only respond here. > > > > > > > > > > > > I have two comments: > > > > > > - There is no reason why this should be limited to the root cgroup. We > > > > > > can keep track of the cgroup being flushed, and use > > > > > > cgroup_is_descendant() to find out if the cgroup we want to flush is a > > > > > > descendant of it. We can use a pointer and cmpxchg primitives instead > > > > > > of the atomic here IIUC. > > > > > > > > > > > > - More importantly, I am not a fan of skipping the flush if there is > > > > > > an ongoing one. For all we know, the ongoing flush could have just > > > > > > started and the stats have not been flushed yet. This is another > > > > > > example of non deterministic behavior that could be difficult to > > > > > > debug. > > > > > > > > > > Even with the flush, there will almost always per-cpu updates which will > > > > > be missed. This can not be fixed unless we block the stats updaters as > > > > > well (which is not going to happen). So, we are already ok with this > > > > > level of non-determinism. Why skipping flushing would be worse? One may > > > > > argue 'time window is smaller' but this still does not cap the amount of > > > > > updates. So, unless there is concrete data that this skipping flushing > > > > > is detrimental to the users of stats, I don't see an issue in the > > > > > presense of periodic flusher. > > > > > > > > As you mentioned, the updates that happen during the flush are > > > > unavoidable anyway, and the window is small. On the other hand, we > > > > should be able to maintain the current behavior that at least all the > > > > stat updates that happened *before* the call to cgroup_rstat_flush() > > > > are flushed after the call. > > > > > > > > The main concern here is that the stats read *after* an event occurs > > > > should reflect the system state at that time. For example, a proactive > > > > reclaimer reading the stats after writing to memory.reclaim should > > > > observe the system state after the reclaim operation happened. > > > > > > What about the in-kernel users like kswapd? I don't see any before or > > > after events for the in-kernel users. > > > > The example I can think of off the top of my head is the cache trim > > mode scenario I mentioned when discussing your patch (i.e. not > > realizing that file memory had already been reclaimed). > > Kswapd has some kind of cache trim failure mode where it decides to skip > cache trim heuristic. Also for global reclaim there are couple more > condition in play as well. I was mostly concerned about entering cache trim mode when we shouldn't, not vice versa, as I explained in the other thread. Anyway, I think the problem of missing stat updates of events is more pronounced with userspace reads. > > > There is also > > a heuristic in zswap that may writeback more (or less) pages that it > > should to the swap device if the stats are significantly stale. > > > > Is this the ratio of MEMCG_ZSWAP_B and MEMCG_ZSWAPPED in > zswap_shrinker_count()? There is already a target memcg flush in that > function and I don't expect root memcg flush from there. I was thinking of the generic approach I suggested, where we can avoid contending on the lock if the cgroup is a descendant of the cgroup being flushed, regardless of whether or not it's the root memcg. I think this would be more beneficial than just focusing on root flushes.