On Mon, Jan 22, 2024 at 12:39 AM kernel test robot <oliver.sang@xxxxxxxxx> wrote: > > > > hi, Yosry Ahmed, > > per your suggestion in > https://lore.kernel.org/all/CAJD7tkameJBrJQxRj+ibKL6-yd-i0wyoyv2cgZdh3ZepA1p7wA@xxxxxxxxxxxxxx/ > "I think it would be useful to know if there are > regressions/improvements in other microbenchmarks, at least to > investigate whether they represent real regressions." > > we still report below two regressions to you just FYI what we observed in our > microbenchmark tests. > (we still captured will-it-scale::fallocate regression but ignore here per > your commit message) > > > Hello, > > kernel test robot noticed a -36.6% regression of vm-scalability.throughput on: > > > commit: 8d59d2214c2362e7a9d185d80b613e632581af7b ("mm: memcg: make stats flushing threshold per-memcg") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > testcase: vm-scalability > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > parameters: > > runtime: 300s > size: 1T > test: lru-shm > cpufreq_governor: performance > > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ > > In addition to that, the commit also has significant impact on the following tests: > > +------------------+----------------------------------------------------------------------------------------------------+ > | testcase: change | will-it-scale: will-it-scale.per_process_ops -32.3% regression | > | test machine | 104 threads 2 sockets (Skylake) with 192G memory | > | test parameters | cpufreq_governor=performance | > | | mode=process | > | | nr_task=50% | > | | test=tlb_flush2 | > +------------------+----------------------------------------------------------------------------------------------------+ > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > | Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@xxxxxxxxx Thanks for reporting this. We have had these patches running on O(10K) machines in our production for a while now, and there haven't been any complaints (at least not yet). OTOH, we do see significant CPU savings on reading memcg stats. That being said, I think we can improve the performance here by caching pointers to the parent_memcg->vmstats_percpu and memcg->vmstats in struct memcg_vmstat_percpu. This should significantly reduce the memory fetches in the loop in memcg_rstat_updated(). Oliver, would you be able to test if the attached patch helps? It's based on 8d59d2214c236. [..]
Attachment:
0001-mm-memcg-optimize-parent-iteration-in-memcg_rstat_up.patch
Description: Binary data