On Mon, Aug 5, 2024 at 11:05 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Yafang Shao <laoar.shao@xxxxxxxxx> writes: > > > On Mon, Aug 5, 2024 at 9:41 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> > >> Yafang Shao <laoar.shao@xxxxxxxxx> writes: > >> > >> [snip] > >> > >> > > >> > Why introduce a systl knob? > >> > =========================== > >> > > >> > From the above data, it's clear that different CPU types have varying > >> > allocation latencies concerning zone->lock contention. Typically, people > >> > don't release individual kernel packages for each type of x86_64 CPU. > >> > > >> > Furthermore, for latency-insensitive applications, we can keep the default > >> > setting for better throughput. > >> > >> Do you have any data to prove that the default setting is better for > >> throughput? If so, that will be a strong support for your patch. > > > > No, I don't. The primary reason we can't change the default value from > > 5 to 0 across our fleet of servers is that you initially set it to 5. > > The sysadmins believe you had a strong reason for setting it to 5 by > > default; otherwise, it would be considered careless for the upstream > > kernel. I also believe you must have had a solid justification for > > setting the default value to 5; otherwise, why would you have > > submitted your patches? > > In commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to > avoid too long latency"), I tried my best to run test on the machines > available with a micro-benchmark (will-it-scale/page_fault1) which > exercises kernel page allocator heavily. From the data in commit, > larger CONFIG_PCP_BATCH_SCALE_MAX helps throughput a little, but not > much. The 99% alloc/free latency can be kept within about 100us with > CONFIG_PCP_BATCH_SCALE_MAX == 5. So, we chose 5 as default value. > > But, we can always improve the default value with more data, on more > types of machines and with more types of benchmarks, etc. > > Your data suggest smaller default value because you have data to show > that larger default value has the latency spike issue (as large as tens > ms) for some practical workloads. Which weren't tested previously. In > contrast, we don't have strong data to show the throughput advantages of > larger CONFIG_PCP_BATCH_SCALE_MAX value. > > So, I suggest to use a smaller default value for > CONFIG_PCP_BATCH_SCALE_MAX. But, we may need more test to check the > data for 1, 2, 3, and 4, in addtion to 0 and 5 to determine the best > choice. Which smaller default value would be better? How can we ensure that other workloads, which we haven't tested, will work well with this new default value? If you have a better default value in mind, would you consider sending a patch for it? I would be happy to test it with my test case. -- Regards Yafang