On Thu, Apr 04, 2024 at 05:33:16PM +0200, Vlastimil Babka wrote: > Sven reports an infinite loop in __alloc_pages_slowpath() for costly order > __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination > can happen in a suspend/resume context where a GFP_KERNEL allocation can > have __GFP_IO masked out via gfp_allowed_mask. > > Quoting Sven: > > 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) > with __GFP_RETRY_MAYFAIL set. > > 2. page alloc's __alloc_pages_slowpath tries to get a page from the > freelist. This fails because there is nothing free of that costly > order. > > 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, > which bails out because a zone is ready to be compacted; it pretends > to have made a single page of progress. > > 4. page alloc tries to compact, but this always bails out early because > __GFP_IO is not set (it's not passed by the snd allocator, and even > if it were, we are suspending so the __GFP_IO flag would be cleared > anyway). > > 5. page alloc believes reclaim progress was made (because of the > pretense in item 3) and so it checks whether it should retry > compaction. The compaction retry logic thinks it should try again, > because: > a) reclaim is needed because of the early bail-out in item 4 > b) a zonelist is suitable for compaction > > 6. goto 2. indefinite stall. > > (end quote) > > The immediate root cause is confusing the COMPACT_SKIPPED returned from > __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be > indicating a lack of order-0 pages, and in step 5 evaluating that in > should_compact_retry() as a reason to retry, before incrementing and > limiting the number of retries. There are however other places that > wrongly assume that compaction can happen while we lack __GFP_IO. > > To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO > evaluation and switch the open-coded test in try_to_compact_pages() to use > it. > > Also use the new helper in: > - compaction_ready(), which will make reclaim not bail out in step 3, so > there's at least one attempt to actually reclaim, even if chances are > small for a costly order > - in_reclaim_compaction() which will make should_continue_reclaim() > return false and we don't over-reclaim unnecessarily > - in __alloc_pages_slowpath() to set a local variable can_compact, > which is then used to avoid retrying reclaim/compaction for costly > allocations (step 5) if we can't compact and also to skip the early > compaction attempt that we do in some cases > > Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@xxxxxxx > Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> > Reported-by: Sven van Ashbrook <svenva@xxxxxxxxxxxx> > Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@xxxxxxxxxxxxxx/ > Tested-by: Karthikeyan Ramasubramanian <kramasub@xxxxxxxxxxxx> > Cc: Brian Geffon <bgeffon@xxxxxxxxxx> > Cc: Curtis Malainey <cujomalainey@xxxxxxxxxxxx> > Cc: Jaroslav Kysela <perex@xxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Takashi Iwai <tiwai@xxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > (cherry picked from commit 803de9000f334b771afacb6ff3e78622916668b0) > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> All backports now queued up, thanks! greg k-h