Re: [PATCH v3 0/4] mm: clarify nofail memory allocation

Barry Song <21cnbao@xxxxxxxxx> · Tue, 27 Aug 2024 19:15:45 +1200

On Tue, Aug 27, 2024 at 12:10 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 8/22/24 11:34, Linus Torvalds wrote:
> > On Thu, 22 Aug 2024 at 17:27, David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> To me, that implies that if you pass in MAX_ORDER+1 the VM will "retry
> >> infinitely". if that implies just OOPSing or actually be in a busy loop,
> >> I don't care. It could effectively happen with MAX_ORDER as well, as
> >> stated. But certainly not BUG_ON.
> >
> > No BUG_ON(), but also no endless loop.
> >
> > Just return NULL for bogus users. Really. Give a WARN_ON_ONCE() to
> > make it easy to find offenders, and then let them deal with it.
>
> Right now we give the WARN_ON_ONCE() (for !can_direct_reclaim) only when
> we're about to actually return NULL, so the memory has to be depleted
> already. To make it easier to find the offenders much more reliably, we
> should consider doing it sooner, but also not add unnecessary overhead to
> allocator fastpaths just because of the potentially buggy users. So either
> always in __alloc_pages_slowpath(), which should be often enough (unless the
> system never needs to wake up kswapd to reclaim) but with negligible enough
> overhead, or on every allocation but only with e.g. CONFIG_DEBUG_VM?

We already have a WARN_ON for order > 1 in rmqueue. we might extend
the condition there to include checking flags as well?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7dcb0713eb57..b5717c6569f9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3071,8 +3071,11 @@ struct page *rmqueue(struct zone *preferred_zone,
  /*
  * We most definitely don't want callers attempting to
  * allocate greater than order-1 page units with __GFP_NOFAIL.
+ * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM,
+ * which can result in a lockup
  */
- WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
+ WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) &&
+     (order > 1 || !(gfp_flags & __GFP_DIRECT_RECLAIM)));

  if (likely(pcp_allowed_order(order))) {
  page = rmqueue_pcplist(preferred_zone, zone, order,

>
> > Don't take it upon yourself to say "we have to deal with any amount of
> > stupidity".
> >
> > The MM layer is not some slave to users. The MM layer is one of the
> > most core pieces of code in the kernel, and as such the MM layer is
> > damn well in charge.
> >
> > Nobody has the right to say "I will not deal with allocation
> > failures". The MM should not bend over backwards over something like
> > that.
> >
> > Seriously. Get a spine already, people. Tell random drivers that claim
> > that they cannot deal with errors to just f-ck off.
> >
> > And you don't do it by looping forever, and you don't do it by killing
> > the kernel. You do it by ignoring their bullying tactics.
> >
> > Then you document the *LIMITED* cases where you actually will try forever.
> >
> > This discussion has gone on for too damn long.
> >
> >               Linus
>