cr composes of all updates together (corresponds to stats_updates in memcg_rstat_updated(), max_cr is change rate per counter) cr = Σ cr_i <= nr_counters * max_cr By combining these two we get shortest time between flushes: cr * Î?t <= nr_counters * max_cr * Î?t nr_cpus * MEMCG_CHARGE_BATCH <= nr_counters * max_cr * Î?t Î?t >= (nr_cpus * MEMCG_CHARGE_BATCH) / (nr_counters * max_cr) We are interested in R_amort = flush_work / Î?t which is R_amort <= flush_work * nr_counters * max_cr / (nr_cpus * MEMCG_CHARGE_BATCH) R_amort: O( nr_cpus * nr_cgroups(subtree) * nr_counters * (nr_counters * max_cr) / (nr_cpus * MEMCG_CHARGE_BATCH) ) R_amort: O( nr_cgroups(subtree) * nr_counters^2 * max_cr) / (MEMCG_CHARGE_BATCH) ) The square looks interesting given there are already tens of counters. (As data from Ivan have shown, we can hardly restore the pre-rstat performance on the read side even with mere mod_delayed_work().) This is what you partially solved with introduction of NR_MEMCG_EVENTS but the stats_updates was still sum of all events, so the flush might have still triggered too frequently. Maybe that would be better long-term approach, splitting into accurate and approximate counters and reflect that in the error estimator stats_updates. Or some other optimization of mem_cgroup_css_rstat_flush().