On Tue, Jun 25, 2024 at 01:45:00PM GMT, Yosry Ahmed wrote: > On Tue, Jun 25, 2024 at 9:21 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > > > On Tue, Jun 25, 2024 at 09:00:03AM GMT, Yosry Ahmed wrote: > > [...] > > > > > > My point is not about accuracy, although I think it's a reasonable > > > argument on its own (a lot of things could change in a short amount of > > > time, which is why I prefer magnitude-based ratelimiting). > > > > > > My point is about logical ordering. If a userspace program reads the > > > stats *after* an event occurs, it expects to get a snapshot of the > > > system state after that event. Two examples are: > > > > > > - A proactive reclaimer reading the stats after a reclaim attempt to > > > check if it needs to reclaim more memory or fallback. > > > - A userspace OOM killer reading the stats after a usage spike to > > > decide which workload to kill. > > > > > > I listed such examples with more detail in [1], when I removed > > > stats_flush_ongoing from the memcg code. > > > > > > [1]https://lore.kernel.org/lkml/20231129032154.3710765-6-yosryahmed@xxxxxxxxxx/ > > > > You are kind of arbitrarily adding restrictions and rules here. Why not > > follow the rules of a well established and battle tested stats infra > > used by everyone i.e. vmstats? There is no sync flush and there are > > frequent async flushes. I think that is what Jesper wants as well. > > That's how the memcg stats worked previously since before rstat and > until the introduction of stats_flush_ongoing AFAICT. We saw an actual > behavioral change when we were moving from a pre-rstat kernel to a > kernel with stats_flush_ongoing. This was the rationale when I removed > stats_flush_ongoing in [1]. It's not a new argument, I am just > reiterating what we discussed back then. In my reply above, I am not arguing to go back to the older stats_flush_ongoing situation. Rather I am discussing what should be the best eventual solution. From the vmstats infra, we can learn that frequent async flushes along with no sync flush, users are fine with the 'non-determinism'. Of course cgroup stats are different from vmstats i.e. are hierarchical but I think we can try out this approach and see if this works or not. BTW it seems like this topic should be discussed be discussed face-to-face over vc or LPC. What do you folks thing? Shakeel