On Mon, 1 Jul 2024 22:20:46 +0800 Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > Currently, we're encountering latency spikes in our container environment > when a specific container with multiple Python-based tasks exits. These > tasks may hold the zone->lock for an extended period, significantly > impacting latency for other containers attempting to allocate memory. Is this locking issue well understood? Is anyone working on it? A reasonably detailed description of the issue and a description of any ongoing work would be helpful here. > --- a/Documentation/admin-guide/sysctl/vm.rst > +++ b/Documentation/admin-guide/sysctl/vm.rst > @@ -856,6 +856,10 @@ on per-cpu page lists. This entry only changes the value of hot per-cpu > page lists. A user can specify a number like 100 to allocate 1/100th of > each zone between per-cpu lists. > > +The minimum number of pages that can be stored in per-CPU page lists is > +four times the batch value. By writing '-1' to this sysctl, you can set > +this minimum value. I suggest we also describe why an operator would want to set this, and the expected effects of that action. > The batch value of each per-cpu page list remains the same regardless of > the value of the high fraction so allocation latencies are unaffected. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2e22ce5675ca..e7313f9d704b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5486,6 +5486,10 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online, > int nr_split_cpus; > unsigned long total_pages; > > + /* Setting -1 to set the minimum pagelist size, four times the batch size */ Some old-timers still use 80-column xterms ;) > + if (high_fraction == -1) > + return batch << 2; > + > if (!high_fraction) { > /* > * By default, the high value of the pcp is based on the zone