On Sat, Oct 21, 2023 at 01:42:58AM +0800, Yosry Ahmed wrote: > On Fri, Oct 20, 2023 at 10:23 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > > > On Fri, Oct 20, 2023 at 9:18 AM kernel test robot <oliver.sang@xxxxxxxxx> wrote: > > > > > > > > > > > > Hello, > > > > > > kernel test robot noticed a -25.8% regression of will-it-scale.per_thread_ops on: > > > > > > > > > commit: 51d74c18a9c61e7ee33bc90b522dd7f6e5b80bb5 ("[PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg") > > > url: https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-memcg-change-flush_next_time-to-flush_last_time/20231010-112257 > > > base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything > > > patch link: https://lore.kernel.org/all/20231010032117.1577496-4-yosryahmed@xxxxxxxxxx/ > > > patch subject: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg > > > > > > testcase: will-it-scale > > > test machine: 104 threads 2 sockets (Skylake) with 192G memory > > > parameters: > > > > > > nr_task: 100% > > > mode: thread > > > test: fallocate1 > > > cpufreq_governor: performance > > > > > > > > > In addition to that, the commit also has significant impact on the following tests: > > > > > > +------------------+---------------------------------------------------------------+ > > > | testcase: change | will-it-scale: will-it-scale.per_thread_ops -30.0% regression | > > > | test machine | 104 threads 2 sockets (Skylake) with 192G memory | > > > | test parameters | cpufreq_governor=performance | > > > | | mode=thread | > > > | | nr_task=50% | > > > | | test=fallocate1 | > > > +------------------+---------------------------------------------------------------+ > > > > > > > Yosry, I don't think 25% to 30% regression can be ignored. Unless > > there is a quick fix, IMO this series should be skipped for the > > upcoming kernel open window. > > I am currently looking into it. It's reasonable to skip the next merge > window if a quick fix isn't found soon. > > I am surprised by the size of the regression given the following: > 1.12 ą 5% +1.4 2.50 ą 2% > perf-profile.self.cycles-pp.__mod_memcg_lruvec_state > > IIUC we are only spending 1% more time in __mod_memcg_lruvec_state(). Yes, this is kind of confusing. And we have seen similar cases before, espcially for micro benchmark like will-it-scale, stressng, netperf etc, the change to those functions in hot path was greatly amplified in the final benchmark score. In a netperf case, https://lore.kernel.org/lkml/20220619150456.GB34471@xsang-OptiPlex-9020/ the affected functions have around 10% change in perf's cpu-cycles, and trigger 69% regression. IIRC, micro benchmarks are very sensitive to those statistics update, like memcg's and vmstat. Thanks, Feng