Hi, Yafang, Yafang Shao <laoar.shao@xxxxxxxxx> writes: > Currently, we're encountering latency spikes in our container environment > when a specific container with multiple Python-based tasks exits. Can you show some data? On which kind of machine, how long is the latency? > These > tasks may hold the zone->lock for an extended period, significantly > impacting latency for other containers attempting to allocate memory. So, the allocation latency is influenced, not application exit latency? Could you measure the run time of free_pcppages_bulk(), this can be done via ftrace function_graph tracer. We want to check whether this is a common issue. In commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too long latency"), we have measured the allocation/free latency for different CONFIG_PCP_BATCH_SCALE_MAX. The target in the commit is to control the latency <= 100us. > As a workaround, we've found that minimizing the pagelist size, such as > setting it to 4 times the batch size, can help mitigate these spikes. > However, managing vm.percpu_pagelist_high_fraction across a large fleet of > servers poses challenges due to variations in CPU counts, NUMA nodes, and > physical memory capacities. > > To enhance practicality, we propose allowing the setting of -1 for > vm.percpu_pagelist_high_fraction to designate a minimum pagelist size. If it is really necessary, can we just use a large enough number for vm.percpu_pagelist_high_fraction? For example, (1 << 30)? > Furthermore, considering the challenges associated with utilizing > vm.percpu_pagelist_high_fraction, it would be beneficial to introduce a > more intuitive parameter, vm.percpu_pagelist_high_size, that would permit > direct specification of the pagelist size as a multiple of the batch size. > This methodology would mirror the functionality of vm.dirty_ratio and > vm.dirty_bytes, providing users with greater flexibility and control. > > We have discussed the possibility of introducing multiple small zones to > mitigate the contention on the zone->lock[0], but this approach is likely > to require a longer-term implementation effort. > > Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@xxxxxxxxxxxxxxxxxxxx/ [0] > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > Cc: "Huang, Ying" <ying.huang@xxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> [snip] -- Best Regards, Huang, Ying