Re: [PATCH] mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the minimum pagelist

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Mon, 1 Jul 2024 19:51:43 -0700

On Mon,  1 Jul 2024 22:20:46 +0800 Yafang Shao <laoar.shao@xxxxxxxxx> wrote:

> Currently, we're encountering latency spikes in our container environment
> when a specific container with multiple Python-based tasks exits. These
> tasks may hold the zone->lock for an extended period, significantly
> impacting latency for other containers attempting to allocate memory.

Is this locking issue well understood?  Is anyone working on it?  A
reasonably detailed description of the issue and a description of any
ongoing work would be helpful here.

> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -856,6 +856,10 @@ on per-cpu page lists. This entry only changes the value of hot per-cpu
>  page lists. A user can specify a number like 100 to allocate 1/100th of
>  each zone between per-cpu lists.
>  
> +The minimum number of pages that can be stored in per-CPU page lists is
> +four times the batch value. By writing '-1' to this sysctl, you can set
> +this minimum value.

I suggest we also describe why an operator would want to set this, and
the expected effects of that action.

>  The batch value of each per-cpu page list remains the same regardless of
>  the value of the high fraction so allocation latencies are unaffected.
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2e22ce5675ca..e7313f9d704b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5486,6 +5486,10 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online,
>  	int nr_split_cpus;
>  	unsigned long total_pages;
>  
> +	/* Setting -1 to set the minimum pagelist size, four times the batch size */

Some old-timers still use 80-column xterms ;)

> +	if (high_fraction == -1)
> +		return batch << 2;
> +
>  	if (!high_fraction) {
>  		/*
>  		 * By default, the high value of the pcp is based on the zone