Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> writes: > On Wed, Sep 20, 2023 at 02:18:54PM +0800, Huang Ying wrote: >> One target of PCP is to minimize pages in PCP if the system free pages >> is too few. To reach that target, when page reclaiming is active for >> the zone (ZONE_RECLAIM_ACTIVE), we will stop increasing PCP high in >> allocating path, decrease PCP high and free some pages in freeing >> path. But this may be too late because the background page reclaiming >> may introduce latency for some workloads. So, in this patch, during >> page allocation we will detect whether the number of free pages of the >> zone is below high watermark. If so, we will stop increasing PCP high >> in allocating path, decrease PCP high and free some pages in freeing >> path. With this, we can reduce the possibility of the premature >> background page reclaiming caused by too large PCP. >> >> The high watermark checking is done in allocating path to reduce the >> overhead in hotter freeing path. >> >> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> >> Cc: Vlastimil Babka <vbabka@xxxxxxx> >> Cc: David Hildenbrand <david@xxxxxxxxxx> >> Cc: Johannes Weiner <jweiner@xxxxxxxxxx> >> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> >> Cc: Michal Hocko <mhocko@xxxxxxxx> >> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> >> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> >> Cc: Christoph Lameter <cl@xxxxxxxxx> >> --- >> include/linux/mmzone.h | 1 + >> mm/page_alloc.c | 22 ++++++++++++++++++++-- >> 2 files changed, 21 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index d6cfb5023f3e..8a19e2af89df 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -1006,6 +1006,7 @@ enum zone_flags { >> * Cleared when kswapd is woken. >> */ >> ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */ >> + ZONE_BELOW_HIGH, /* zone is below high watermark. */ >> }; >> >> static inline unsigned long zone_managed_pages(struct zone *zone) >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 225abe56752c..3f8c7dfeed23 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2409,7 +2409,13 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, >> return min(batch << 2, pcp->high); >> } >> >> - if (pcp->count >= high && high_min != high_max) { >> + if (high_min == high_max) >> + return high; >> + >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { >> + pcp->high = max(high - (batch << pcp->free_factor), high_min); >> + high = max(pcp->count, high_min); >> + } else if (pcp->count >= high) { >> int need_high = (batch << pcp->free_factor) + batch; >> >> /* pcp->high should be large enough to hold batch freed pages */ >> @@ -2459,6 +2465,10 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, >> if (pcp->count >= high) { >> free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), >> pcp, pindex); >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && >> + zone_watermark_ok(zone, 0, high_wmark_pages(zone), >> + ZONE_MOVABLE, 0)) >> + clear_bit(ZONE_BELOW_HIGH, &zone->flags); >> } >> } >> >> @@ -2765,7 +2775,7 @@ static int nr_pcp_alloc(struct per_cpu_pages *pcp, struct zone *zone, int order) >> * If we had larger pcp->high, we could avoid to allocate from >> * zone. >> */ >> - if (high_min != high_max && !test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags)) >> + if (high_min != high_max && !test_bit(ZONE_BELOW_HIGH, &zone->flags)) >> high = pcp->high = min(high + batch, high_max); >> >> if (!order) { >> @@ -3226,6 +3236,14 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, >> } >> } >> >> + mark = high_wmark_pages(zone); >> + if (zone_watermark_fast(zone, order, mark, >> + ac->highest_zoneidx, alloc_flags, >> + gfp_mask)) >> + goto try_this_zone; >> + else if (!test_bit(ZONE_BELOW_HIGH, &zone->flags)) >> + set_bit(ZONE_BELOW_HIGH, &zone->flags); >> + > > This absolutely needs a comment explaning why because superficially a > consequence of this is that allocator performance is slightly degraded > when below the high watermark. Being below the high watermark is > completely harmless and can persist indefinitely until something wakes > kswapd. Sure. Will add some comments here. -- Best Regards, Huang, Ying