Re: [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 11, 2023 at 10:47:36PM +0900, Tetsuo Handa wrote:
> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
> observe this flag being set immediately after another thread doing
> __GFP_KSWAPD_RECLAIM allocation request set this flag.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")

The issue is real but it needs to be explained why this is a problem.
Only allocation contexts that specify ALLOC_KSWAPD should wake kswapd
similar to this

        if (alloc_flags & ALLOC_KSWAPD)
                wake_all_kswapds(order, gfp_mask, ac);

The consequences are that kswapd could potentially be woken spuriously
for callsites that clear __GFP_KSWAPD_RECLAIM explicitly or implicitly
via combinations like GFP_TRANSHUGE_LIGHT. The other side is that kswapd
does not get woken to reclaim pages up to the boosted watermark
leading to a higher risk of fragmentation that may prevent future
hugepage allocations.

There is a slight risk this will increase reclaim because the zone flag
is not being cleared in as many contexts but the risk is low.

I also suggest as a micro-optimisation that ALLOC_KSWAPD is checked first
because it should be cache hot and cheaper than the shared cache line for
zone flags.

-- 
Mel Gorman
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux