On Fri, Feb 25, 2022 at 2:23 AM Daniel Dao <dqminh@xxxxxxxxxxxxxx> wrote: > I think this looks good so far. I compared a flamegraph before to a flamegraph after (10s @ 99Hz on 96-core CPU evenly loaded to ~75% in both cases). Before: 1.4% spent in workingset_refault. After: 0.5% spent in flush_memcg_stats_dwork. The latter is all in kworkers (as expected), while the former is spread across IO active tasks. This seems like a great first step that should be merged on its own. It would be good to also do something to improve the CPU time spent in delayed work, if possible, as 0.5% of on-CPU time is not a negligible amount.