On 2/26/25 4:22 AM, Gabriel Krisman Bertazi wrote: > Commit 96a5c186efff ("mm/page_alloc.c: don't show protection in zone's > ->lowmem_reserve[] for empty zone") removes the protection of lower > zones from allocations targeting memory-less high zones. This had an > unintended impact on the pattern of reclaims because it makes the > high-zone-targeted allocation more likely to succeed in lower zones, > which adds pressure to said zones. I.e, the following corresponding > checks in zone_watermark_ok/zone_watermark_fast are less likely to > trigger: > > if (free_pages <= min + z->lowmem_reserve[highest_zoneidx]) > return false; > > As a result, we are observing an increase in reclaim and kswapd scans, > due to the increased pressure. This was initially observed as increased > latency in filesystem operations when benchmarking with fio on a machine > with some memory-less zones, but it has since been associated with > increased contention in locks related to memory reclaim. By reverting > this patch, the original performance was recovered on that machine. > > The original commit was introduced as a clarification of the > /proc/zoneinfo output, so it doesn't seem there are usecases depending > on it, making the revert a simple solution. > > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > Cc: Baoquan He <bhe@xxxxxxxxxx> > Fixes: 96a5c186efff ("mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone") > Signed-off-by: Gabriel Krisman Bertazi <krisman@xxxxxxx> Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> > --- > mm/page_alloc.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 579789600a3c..fe986e6de7a0 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5849,11 +5849,10 @@ static void setup_per_zone_lowmem_reserve(void) > > for (j = i + 1; j < MAX_NR_ZONES; j++) { > struct zone *upper_zone = &pgdat->node_zones[j]; > - bool empty = !zone_managed_pages(upper_zone); > > managed_pages += zone_managed_pages(upper_zone); > > - if (clear || empty) > + if (clear) > zone->lowmem_reserve[j] = 0; > else > zone->lowmem_reserve[j] = managed_pages / ratio;