On Wed, Oct 25, 2023 at 10:06 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Tue, Oct 24, 2023 at 11:23 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > [...] > > > > Thanks Oliver for running the numbers. If I understand correctly the > > will-it-scale.fallocate1 microbenchmark is the only one showing > > significant regression here, is this correct? > > > > In my runs, other more representative microbenchmarks benchmarks like > > netperf and will-it-scale.page_fault* show minimal regression. I would > > expect practical workloads to have high concurrency of page faults or > > networking, but maybe not fallocate/ftruncate. > > > > Oliver, in your experience, how often does such a regression in such a > > microbenchmark translate to a real regression that people care about? > > (or how often do people dismiss it?) > > > > I tried optimizing this further for the fallocate/ftruncate case but > > without luck. I even tried moving stats_updates into cgroup core > > (struct cgroup_rstat_cpu) to reuse the existing loop in > > cgroup_rstat_updated() -- but it somehow made it worse. > > > > On the other hand, we do have some machines in production running this > > series together with a previous optimization for non-hierarchical > > stats [1] on an older kernel, and we do see significant reduction in > > cpu time spent on reading the stats. Domenico did a similar experiment > > with only this series and reported similar results [2]. > > > > Shakeel, Johannes, (and other memcg folks), I personally think the > > benefits here outweigh a regression in this particular benchmark, but > > I am obviously biased. What do you think? > > > > [1]https://lore.kernel.org/lkml/20230726153223.821757-2-yosryahmed@xxxxxxxxxx/ > > [2]https://lore.kernel.org/lkml/CAFYChMv_kv_KXOMRkrmTN-7MrfgBHMcK3YXv0dPYEL7nK77e2A@xxxxxxxxxxxxxx/ > > I still am not convinced of the benefits outweighing the regression > but I would not block this. So, let's do this, skip this open window, > get the patch series reviewed and hopefully we can work together on > fixing that regression and we can make an informed decision of > accepting the regression for this series for the next cycle. Skipping this open window sounds okay to me. FWIW, I think with this patch series we can keep the old behavior (roughly) and hide the changes behind a tunable (config option or sysfs file). I think the only changes that need to be done to the code to approximate the previous behavior are: - Use root when updating the pending stats in memcg_rstat_updated() instead of the passed memcg. - Use root in mem_cgroup_flush_stats() instead of the passed memcg. - Use mutex_trylock() instead of mutex_lock() in mem_cgroup_flush_stats(). So I think it should be doable to hide most changes behind a tunable, but let's not do this unless necessary.