On Tue, Aug 27, 2024 at 12:10 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 8/22/24 11:34, Linus Torvalds wrote: > > On Thu, 22 Aug 2024 at 17:27, David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >> To me, that implies that if you pass in MAX_ORDER+1 the VM will "retry > >> infinitely". if that implies just OOPSing or actually be in a busy loop, > >> I don't care. It could effectively happen with MAX_ORDER as well, as > >> stated. But certainly not BUG_ON. > > > > No BUG_ON(), but also no endless loop. > > > > Just return NULL for bogus users. Really. Give a WARN_ON_ONCE() to > > make it easy to find offenders, and then let them deal with it. > > Right now we give the WARN_ON_ONCE() (for !can_direct_reclaim) only when > we're about to actually return NULL, so the memory has to be depleted > already. To make it easier to find the offenders much more reliably, we > should consider doing it sooner, but also not add unnecessary overhead to > allocator fastpaths just because of the potentially buggy users. So either > always in __alloc_pages_slowpath(), which should be often enough (unless the > system never needs to wake up kswapd to reclaim) but with negligible enough > overhead, or on every allocation but only with e.g. CONFIG_DEBUG_VM? We already have a WARN_ON for order > 1 in rmqueue. we might extend the condition there to include checking flags as well? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7dcb0713eb57..b5717c6569f9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3071,8 +3071,11 @@ struct page *rmqueue(struct zone *preferred_zone, /* * We most definitely don't want callers attempting to * allocate greater than order-1 page units with __GFP_NOFAIL. + * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM, + * which can result in a lockup */ - WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); + WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && + (order > 1 || !(gfp_flags & __GFP_DIRECT_RECLAIM))); if (likely(pcp_allowed_order(order))) { page = rmqueue_pcplist(preferred_zone, zone, order, > > > Don't take it upon yourself to say "we have to deal with any amount of > > stupidity". > > > > The MM layer is not some slave to users. The MM layer is one of the > > most core pieces of code in the kernel, and as such the MM layer is > > damn well in charge. > > > > Nobody has the right to say "I will not deal with allocation > > failures". The MM should not bend over backwards over something like > > that. > > > > Seriously. Get a spine already, people. Tell random drivers that claim > > that they cannot deal with errors to just f-ck off. > > > > And you don't do it by looping forever, and you don't do it by killing > > the kernel. You do it by ignoring their bullying tactics. > > > > Then you document the *LIMITED* cases where you actually will try forever. > > > > This discussion has gone on for too damn long. > > > > Linus >