On Wed, Feb 21, 2024 at 4:44 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > Sven reports an infinite loop in __alloc_pages_slowpath() for costly > order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such > combination can happen in a suspend/resume context where a GFP_KERNEL > allocation can have __GFP_IO masked out via gfp_allowed_mask. > > Quoting Sven: > > 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) > with __GFP_RETRY_MAYFAIL set. > > 2. page alloc's __alloc_pages_slowpath tries to get a page from the > freelist. This fails because there is nothing free of that costly > order. > > 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, > which bails out because a zone is ready to be compacted; it pretends > to have made a single page of progress. > > 4. page alloc tries to compact, but this always bails out early because > __GFP_IO is not set (it's not passed by the snd allocator, and even > if it were, we are suspending so the __GFP_IO flag would be cleared > anyway). > > 5. page alloc believes reclaim progress was made (because of the > pretense in item 3) and so it checks whether it should retry > compaction. The compaction retry logic thinks it should try again, > because: > a) reclaim is needed because of the early bail-out in item 4 > b) a zonelist is suitable for compaction > > 6. goto 2. indefinite stall. > > (end quote) > > The immediate root cause is confusing the COMPACT_SKIPPED returned from > __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be > indicating a lack of order-0 pages, and in step 5 evaluating that in > should_compact_retry() as a reason to retry, before incrementing and > limiting the number of retries. There are however other places that > wrongly assume that compaction can happen while we lack __GFP_IO. > > To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO > evaluation and switch the open-coded test in try_to_compact_pages() to > use it. > > Also use the new helper in: > - compaction_ready(), which will make reclaim not bail out in step 3, so > there's at least one attempt to actually reclaim, even if chances are > small for a costly order > - in_reclaim_compaction() which will make should_continue_reclaim() > return false and we don't over-reclaim unnecessarily > - in __alloc_pages_slowpath() to set a local variable can_compact, > which is then used to avoid retrying reclaim/compaction for costly > allocations (step 5) if we can't compact and also to skip the early > compaction attempt that we do in some cases > > Reported-by: Sven van Ashbrook <svenva@xxxxxxxxxxxx> > Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@xxxxxxxxxxxxxx/ > Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> Tested-by: Karthikeyan Ramasubramanian <kramasub@xxxxxxxxxxxx> > --- > include/linux/gfp.h | 9 +++++++++ > mm/compaction.c | 7 +------ > mm/page_alloc.c | 10 ++++++---- > mm/vmscan.c | 5 ++++- > 4 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index de292a007138..e2a916cf29c4 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp) > return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS); > } > > +/* > + * Check if the gfp flags allow compaction - GFP_NOIO is a really > + * tricky context because the migration might require IO. > + */ > +static inline bool gfp_compaction_allowed(gfp_t gfp_mask) > +{ > + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); > +} > + > extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); > > #ifdef CONFIG_CONTIG_ALLOC > diff --git a/mm/compaction.c b/mm/compaction.c > index 4add68d40e8d..b961db601df4 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > unsigned int alloc_flags, const struct alloc_context *ac, > enum compact_priority prio, struct page **capture) > { > - int may_perform_io = (__force int)(gfp_mask & __GFP_IO); > struct zoneref *z; > struct zone *zone; > enum compact_result rc = COMPACT_SKIPPED; > > - /* > - * Check if the GFP flags allow compaction - GFP_NOIO is really > - * tricky context because the migration might require IO > - */ > - if (!may_perform_io) > + if (!gfp_compaction_allowed(gfp_mask)) > return COMPACT_SKIPPED; > > trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 150d4f23b010..a663202045dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > struct alloc_context *ac) > { > bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; > + bool can_compact = gfp_compaction_allowed(gfp_mask); > const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; > struct page *page = NULL; > unsigned int alloc_flags; > @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > * Don't try this for allocations that are allowed to ignore > * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. > */ > - if (can_direct_reclaim && > + if (can_direct_reclaim && can_compact && > (costly_order || > (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) > && !gfp_pfmemalloc_allowed(gfp_mask)) { > @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > /* > * Do not retry costly high order allocations unless they are > - * __GFP_RETRY_MAYFAIL > + * __GFP_RETRY_MAYFAIL and we can compact > */ > - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) > + if (costly_order && (!can_compact || > + !(gfp_mask & __GFP_RETRY_MAYFAIL))) > goto nopage; > > if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, > @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > * implementation of the compaction depends on the sufficient amount > * of free memory (see __compaction_suitable) > */ > - if (did_some_progress > 0 && > + if (did_some_progress > 0 && can_compact && > should_compact_retry(ac, order, alloc_flags, > compact_result, &compact_priority, > &compaction_retries)) > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4f9c854ce6cc..4255619a1a31 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > /* Use reclaim/compaction for costly allocs or under memory pressure */ > static bool in_reclaim_compaction(struct scan_control *sc) > { > - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && > + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && > (sc->order > PAGE_ALLOC_COSTLY_ORDER || > sc->priority < DEF_PRIORITY - 2)) > return true; > @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) > { > unsigned long watermark; > > + if (!gfp_compaction_allowed(sc->gfp_mask)) > + return false; > + > /* Allocation can already succeed, nothing to do */ > if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), > sc->reclaim_idx, 0)) > -- > 2.43.1 >