On Tue, Aug 27, 2024 at 7:38 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 8/27/24 09:15, Barry Song wrote: > > On Tue, Aug 27, 2024 at 12:10 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > >> > >> On 8/22/24 11:34, Linus Torvalds wrote: > >> > On Thu, 22 Aug 2024 at 17:27, David Hildenbrand <david@xxxxxxxxxx> wrote: > >> >> > >> >> To me, that implies that if you pass in MAX_ORDER+1 the VM will "retry > >> >> infinitely". if that implies just OOPSing or actually be in a busy loop, > >> >> I don't care. It could effectively happen with MAX_ORDER as well, as > >> >> stated. But certainly not BUG_ON. > >> > > >> > No BUG_ON(), but also no endless loop. > >> > > >> > Just return NULL for bogus users. Really. Give a WARN_ON_ONCE() to > >> > make it easy to find offenders, and then let them deal with it. > >> > >> Right now we give the WARN_ON_ONCE() (for !can_direct_reclaim) only when > >> we're about to actually return NULL, so the memory has to be depleted > >> already. To make it easier to find the offenders much more reliably, we > >> should consider doing it sooner, but also not add unnecessary overhead to > >> allocator fastpaths just because of the potentially buggy users. So either > >> always in __alloc_pages_slowpath(), which should be often enough (unless the > >> system never needs to wake up kswapd to reclaim) but with negligible enough > >> overhead, or on every allocation but only with e.g. CONFIG_DEBUG_VM? > > > > We already have a WARN_ON for order > 1 in rmqueue. we might extend > > the condition there to include checking flags as well? > > Ugh, wasn't aware, well spotted. So it means there at least shouldn't be > existing users of __GFP_NOFAIL with order > 1 :) > > But also the check is in the hotpath, even before trying the pcplists, so we > could move it to __alloc_pages_slowpath() while extending it? Agreed. I don't think it is reasonable to check the order and flags in two different places especially rmqueue() has already had gfp_flags & __GFP_NOFAIL operation and order > 1 overhead. We can at least extend the current check to make some improvement though I still believe Michal's suggestion of implementing OOPS_ON is a better approach to pursue, as it doesn't crash the entire system while ensuring the problematic process is terminated. > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 7dcb0713eb57..b5717c6569f9 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -3071,8 +3071,11 @@ struct page *rmqueue(struct zone *preferred_zone, > > /* > > * We most definitely don't want callers attempting to > > * allocate greater than order-1 page units with __GFP_NOFAIL. > > + * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM, > > + * which can result in a lockup > > */ > > - WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > > + WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && > > + (order > 1 || !(gfp_flags & __GFP_DIRECT_RECLAIM))); > > > > if (likely(pcp_allowed_order(order))) { > > page = rmqueue_pcplist(preferred_zone, zone, order, > > > >> > >> > Don't take it upon yourself to say "we have to deal with any amount of > >> > stupidity". > >> > > >> > The MM layer is not some slave to users. The MM layer is one of the > >> > most core pieces of code in the kernel, and as such the MM layer is > >> > damn well in charge. > >> > > >> > Nobody has the right to say "I will not deal with allocation > >> > failures". The MM should not bend over backwards over something like > >> > that. > >> > > >> > Seriously. Get a spine already, people. Tell random drivers that claim > >> > that they cannot deal with errors to just f-ck off. > >> > > >> > And you don't do it by looping forever, and you don't do it by killing > >> > the kernel. You do it by ignoring their bullying tactics. > >> > > >> > Then you document the *LIMITED* cases where you actually will try forever. > >> > > >> > This discussion has gone on for too damn long. > >> > > >> > Linus > >> > Thanks Barry