On 5/25/21 10:01 AM, Mel Gorman wrote: > The pcp high watermark is based on the batch size but there is no > relationship between them other than it is convenient to use early in > boot. > > This patch takes the first step and bases pcp->high on the zone low > watermark split across the number of CPUs local to a zone while the batch > size remains the same to avoid increasing allocation latencies. The intent > behind the default pcp->high is "set the number of PCP pages such that > if they are all full that background reclaim is not started prematurely". > > Note that in this patch the pcp->high values are adjusted after memory > hotplug events, min_free_kbytes adjustments and watermark scale factor > adjustments but not CPU hotplug events which is handled later in the > series. > > On a test KVM instance; > > Before grep -E "high:|batch" /proc/zoneinfo | tail -2 > high: 378 > batch: 63 > > After grep -E "high:|batch" /proc/zoneinfo | tail -2 > high: 649 > batch: 63 > > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> ... > @@ -6637,6 +6628,34 @@ static int zone_batchsize(struct zone *zone) > #endif > } > > +static int zone_highsize(struct zone *zone, int batch) > +{ > +#ifdef CONFIG_MMU > + int high; > + int nr_local_cpus; > + > + /* > + * The high value of the pcp is based on the zone low watermark > + * so that if they are full then background reclaim will not be > + * started prematurely. The value is split across all online CPUs > + * local to the zone. Note that early in boot that CPUs may not be > + * online yet. > + */ > + nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))); > + high = low_wmark_pages(zone) / nr_local_cpus; > + > + /* > + * Ensure high is at least batch*4. The multiple is based on the > + * historical relationship between high and batch. > + */ > + high = max(high, batch << 2); > + > + return high; > +#else > + return 0; > +#endif > +} > + > /* > * pcp->high and pcp->batch values are related and generally batch is lower > * than high. They are also related to pcp->count such that count is lower > @@ -6698,11 +6717,10 @@ static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long h > */ > static void zone_set_pageset_high_and_batch(struct zone *zone) > { > - unsigned long new_high, new_batch; > + int new_high, new_batch; > > - new_batch = zone_batchsize(zone); > - new_high = 6 * new_batch; > - new_batch = max(1UL, 1 * new_batch); > + new_batch = max(1, zone_batchsize(zone)); > + new_high = zone_highsize(zone, new_batch); > > if (zone->pageset_high == new_high && > zone->pageset_batch == new_batch) > @@ -8170,6 +8188,12 @@ static void __setup_per_zone_wmarks(void) > zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp; > zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2; > > + /* > + * The watermark size have changed so update the pcpu batch > + * and high limits or the limits may be inappropriate. > + */ > + zone_set_pageset_high_and_batch(zone); Hm so this puts the call in the path of various watermark related sysctl handlers, but it's not protected by pcp_batch_high_lock. The zone lock won't help against zone_pcp_update() from a hotplug handler. On the other hand, since hotplug handlers also call __setup_per_zone_wmarks(), the zone_pcp_update() calls there are now redundant and could be removed, no? But later there will be a new sysctl in patch 6/6 using pcp_batch_high_lock, thus that one will not be protected against the watermark related sysctl handlers that reach here. To solve all this, seems like the static lock in setup_per_zone_wmarks() could become a top-level visible lock and pcp high/batch updates could switch to that one instead of own pcp_batch_high_lock. And zone_pcp_update() calls from hotplug handlers could be removed. > + > spin_unlock_irqrestore(&zone->lock, flags); > } > >