On Fri, Oct 20, 2023 at 11:30:53AM +0800, Huang, Ying wrote: > Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> writes: > > > On Mon, Oct 16, 2023 at 01:30:01PM +0800, Huang Ying wrote: > >> One target of PCP is to minimize pages in PCP if the system free pages > >> is too few. To reach that target, when page reclaiming is active for > >> the zone (ZONE_RECLAIM_ACTIVE), we will stop increasing PCP high in > >> allocating path, decrease PCP high and free some pages in freeing > >> path. But this may be too late because the background page reclaiming > >> may introduce latency for some workloads. So, in this patch, during > >> page allocation we will detect whether the number of free pages of the > >> zone is below high watermark. If so, we will stop increasing PCP high > >> in allocating path, decrease PCP high and free some pages in freeing > >> path. With this, we can reduce the possibility of the premature > >> background page reclaiming caused by too large PCP. > >> > >> The high watermark checking is done in allocating path to reduce the > >> overhead in hotter freeing path. > >> > >> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> > >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > >> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > >> Cc: Vlastimil Babka <vbabka@xxxxxxx> > >> Cc: David Hildenbrand <david@xxxxxxxxxx> > >> Cc: Johannes Weiner <jweiner@xxxxxxxxxx> > >> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > >> Cc: Michal Hocko <mhocko@xxxxxxxx> > >> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > >> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > >> Cc: Christoph Lameter <cl@xxxxxxxxx> > >> --- > >> include/linux/mmzone.h | 1 + > >> mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++-- > >> 2 files changed, 32 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > >> index ec3f7daedcc7..c88770381aaf 100644 > >> --- a/include/linux/mmzone.h > >> +++ b/include/linux/mmzone.h > >> @@ -1018,6 +1018,7 @@ enum zone_flags { > >> * Cleared when kswapd is woken. > >> */ > >> ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */ > >> + ZONE_BELOW_HIGH, /* zone is below high watermark. */ > >> }; > >> > >> static inline unsigned long zone_managed_pages(struct zone *zone) > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 8382ad2cdfd4..253fc7d0498e 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -2407,7 +2407,13 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, > >> return min(batch << 2, pcp->high); > >> } > >> > >> - if (pcp->count >= high && high_min != high_max) { > >> + if (high_min == high_max) > >> + return high; > >> + > >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { > >> + pcp->high = max(high - (batch << pcp->free_factor), high_min); > >> + high = max(pcp->count, high_min); > >> + } else if (pcp->count >= high) { > >> int need_high = (batch << pcp->free_factor) + batch; > >> > >> /* pcp->high should be large enough to hold batch freed pages */ > >> @@ -2457,6 +2463,10 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, > >> if (pcp->count >= high) { > >> free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), > >> pcp, pindex); > >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && > >> + zone_watermark_ok(zone, 0, high_wmark_pages(zone), > >> + ZONE_MOVABLE, 0)) > >> + clear_bit(ZONE_BELOW_HIGH, &zone->flags); > >> } > >> } > >> > > > > This is a relatively fast path and freeing pages should not need to check > > watermarks. > > Another stuff that mitigate the overhead is that the watermarks checking > only occurs when we free pages from PCP to buddy. That is, in most > cases, every 63 page freeing. > True > > While the overhead is mitigated because it applies only when > > the watermark is below high, that is also potentially an unbounded condition > > if a workload is sized precisely enough. Why not clear this bit when kswapd > > is going to sleep after reclaiming enough pages in a zone? > > IIUC, if the number of free pages is kept larger than the low watermark, > then kswapd will have no opportunity to be waken up even if the number > of free pages was ever smaller than the high watermark. > Also true and I did think of that later. I guess it's ok, the chances are that the series overall offsets any micro-costs like this so I'm happy. If, for some reason, this overhead is noticable (doubtful), then it can be revisted. Thanks. -- Mel Gorman SUSE Labs