On Wed, 20 Sep 2023 14:18:46 +0800 Huang Ying <ying.huang@xxxxxxxxx> wrote: > The page allocation performance requirements of different workloads > are often different. So, we need to tune the PCP (Per-CPU Pageset) > high on each CPU automatically to optimize the page allocation > performance. Some of the performance changes here are downright scary. I've never been very sure that percpu pages was very beneficial (and hey, I invented the thing back in the Mesozoic era). But these numbers make me think it's very important and we should have been paying more attention. > The list of patches in series is as follows, > > 1 mm, pcp: avoid to drain PCP when process exit > 2 cacheinfo: calculate per-CPU data cache size > 3 mm, pcp: reduce lock contention for draining high-order pages > 4 mm: restrict the pcp batch scale factor to avoid too long latency > 5 mm, page_alloc: scale the number of pages that are batch allocated > 6 mm: add framework for PCP high auto-tuning > 7 mm: tune PCP high automatically > 8 mm, pcp: decrease PCP high if free pages < high watermark > 9 mm, pcp: avoid to reduce PCP high unnecessarily > 10 mm, pcp: reduce detecting time of consecutive high order page freeing > > Patch 1/2/3 optimize the PCP draining for consecutive high-order pages > freeing. > > Patch 4/5 optimize batch freeing and allocating. > > Patch 6/7/8/9 implement and optimize a PCP high auto-tuning method. > > Patch 10 optimize the PCP draining for consecutive high order page > freeing based on PCP high auto-tuning. > > The test results for patches with performance impact are as follows, > > kbuild > ====== > > On a 2-socket Intel server with 224 logical CPU, we tested kbuild on > one socket with `make -j 112`. > > build time zone lock% free_high alloc_zone > ---------- ---------- --------- ---------- > base 100.0 43.6 100.0 100.0 > patch1 96.6 40.3 49.2 95.2 > patch3 96.4 40.5 11.3 95.1 > patch5 96.1 37.9 13.3 96.8 > patch7 86.4 9.8 6.2 22.0 > patch9 85.9 9.4 4.8 16.3 > patch10 87.7 12.6 29.0 32.3 You're seriously saying that kbuild got 12% faster? I see that [07/10] (autotuning) alone sped up kbuild by 10%? Other thoughts: - What if any facilities are provided to permit users/developers to monitor the operation of the autotuning algorithm? - I'm not seeing any Documentation/ updates. Surely there are things we can tell users? - This: : It's possible that PCP high auto-tuning doesn't work well for some : workloads. So, when PCP high is tuned by hand via the sysctl knob, : the auto-tuning will be disabled. The PCP high set by hand will be : used instead. Is it a bit hacky to disable autotuning when the user alters pcp-high? Would it be cleaner to have a separate on/off knob for autotuning? And how is the user to determine that "PCP high auto-tuning doesn't work well" for their workload?