On Thu, May 11, 2023 at 10:47:36PM +0900, Tetsuo Handa wrote: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared > variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might > observe this flag being set immediately after another thread doing > __GFP_KSWAPD_RECLAIM allocation request set this flag. > > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") The issue is real but it needs to be explained why this is a problem. Only allocation contexts that specify ALLOC_KSWAPD should wake kswapd similar to this if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac); The consequences are that kswapd could potentially be woken spuriously for callsites that clear __GFP_KSWAPD_RECLAIM explicitly or implicitly via combinations like GFP_TRANSHUGE_LIGHT. The other side is that kswapd does not get woken to reclaim pages up to the boosted watermark leading to a higher risk of fragmentation that may prevent future hugepage allocations. There is a slight risk this will increase reclaim because the zone flag is not being cleared in as many contexts but the risk is low. I also suggest as a micro-optimisation that ALLOC_KSWAPD is checked first because it should be cache hot and cheaper than the shared cache line for zone flags. -- Mel Gorman SUSE Labs