On 8/27/24 09:15, Barry Song wrote: > On Tue, Aug 27, 2024 at 12:10 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: >> >> On 8/22/24 11:34, Linus Torvalds wrote: >> > On Thu, 22 Aug 2024 at 17:27, David Hildenbrand <david@xxxxxxxxxx> wrote: >> >> >> >> To me, that implies that if you pass in MAX_ORDER+1 the VM will "retry >> >> infinitely". if that implies just OOPSing or actually be in a busy loop, >> >> I don't care. It could effectively happen with MAX_ORDER as well, as >> >> stated. But certainly not BUG_ON. >> > >> > No BUG_ON(), but also no endless loop. >> > >> > Just return NULL for bogus users. Really. Give a WARN_ON_ONCE() to >> > make it easy to find offenders, and then let them deal with it. >> >> Right now we give the WARN_ON_ONCE() (for !can_direct_reclaim) only when >> we're about to actually return NULL, so the memory has to be depleted >> already. To make it easier to find the offenders much more reliably, we >> should consider doing it sooner, but also not add unnecessary overhead to >> allocator fastpaths just because of the potentially buggy users. So either >> always in __alloc_pages_slowpath(), which should be often enough (unless the >> system never needs to wake up kswapd to reclaim) but with negligible enough >> overhead, or on every allocation but only with e.g. CONFIG_DEBUG_VM? > > We already have a WARN_ON for order > 1 in rmqueue. we might extend > the condition there to include checking flags as well? Ugh, wasn't aware, well spotted. So it means there at least shouldn't be existing users of __GFP_NOFAIL with order > 1 :) But also the check is in the hotpath, even before trying the pcplists, so we could move it to __alloc_pages_slowpath() while extending it? > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7dcb0713eb57..b5717c6569f9 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3071,8 +3071,11 @@ struct page *rmqueue(struct zone *preferred_zone, > /* > * We most definitely don't want callers attempting to > * allocate greater than order-1 page units with __GFP_NOFAIL. > + * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM, > + * which can result in a lockup > */ > - WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > + WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && > + (order > 1 || !(gfp_flags & __GFP_DIRECT_RECLAIM))); > > if (likely(pcp_allowed_order(order))) { > page = rmqueue_pcplist(preferred_zone, zone, order, > >> >> > Don't take it upon yourself to say "we have to deal with any amount of >> > stupidity". >> > >> > The MM layer is not some slave to users. The MM layer is one of the >> > most core pieces of code in the kernel, and as such the MM layer is >> > damn well in charge. >> > >> > Nobody has the right to say "I will not deal with allocation >> > failures". The MM should not bend over backwards over something like >> > that. >> > >> > Seriously. Get a spine already, people. Tell random drivers that claim >> > that they cannot deal with errors to just f-ck off. >> > >> > And you don't do it by looping forever, and you don't do it by killing >> > the kernel. You do it by ignoring their bullying tactics. >> > >> > Then you document the *LIMITED* cases where you actually will try forever. >> > >> > This discussion has gone on for too damn long. >> > >> > Linus >>