On Fri, Apr 09, 2021 at 09:50:45AM -0700, Shakeel Butt wrote: > On Fri, Apr 9, 2021 at 9:35 AM Masayoshi Mizuma <msys.mizuma@xxxxxxxxx> wrote: > > > [...] > > > Can you please explain how to read these numbers? Or at least put a % > > > regression. > > > > Let me summarize them here. > > The total duration ('total' column above) of each system call is as follows > > if v5.8 is assumed as 100%: > > > > - sendto: > > - v5.8 100% > > - v5.9 128% > > - v5.12-rc6 116% > > > > - revfrom: > > - v5.8 100% > > - v5.9 114% > > - v5.12-rc6 108% > > > > Thanks, that is helpful. Most probably the improvement of 5.12 from > 5.9 is due to 3de7d4f25a7438f ("mm: memcg/slab: optimize objcg stock > draining"). > > [...] > > > > > > One idea would be to increase MEMCG_CHARGE_BATCH. > > > > Thank you for the idea! It's hard-corded as 32 now, so I'm wondering it may be > > a good idea to make MEMCG_CHARGE_BATCH tunable from a kernel parameter or something. > > > Hi! Thank you for your comments! > Can you rerun the benchmark with MEMCG_CHARGE_BATCH equal 64UL? Yes, I reran the benchmark with MEMCG_CHARGE_BATCH == 64UL, but it seems that it doesn't reduce the duration of system calls... - v5.12-rc6 vanilla syscall total (msec) --------- -------- sendto 3049.221 recvfrom 2421.601 - v5.12-rc6 with MEMCG_CHARGE_BATCH==64 syscall total (msec) --------- -------- sendto 3071.607 recvfrom 2436.488 > I think with memcg stats moving to rstat, the stat accuracy is not an > issue if we increase MEMCG_CHARGE_BATCH to 64UL. Not sure if we want > this to be tuneable but most probably we do want this to be sync'ed > with SWAP_CLUSTER_MAX. Thanks. I understand that. Waiman posted some patches to reduce the overhead [1]. I'll try the patch. [1]: https://lore.kernel.org/linux-mm/51ea6b09-b7ee-36e9-a500-b7141bd3a42b@xxxxxxxxxx/T/#me75806a3555e7a42e793f099b98c42e687962d10 Thanks! Masa