On Fri, Jul 19, 2024 at 7:26 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Fri 19-07-24 13:13:28, Vlastimil Babka wrote: > > On 7/19/24 12:52 PM, Michal Hocko wrote: > > > On Fri 19-07-24 12:10:13, Vlastimil Babka wrote: > > >> On 7/19/24 11:33 AM, Michal Hocko wrote: > > >> > On Fri 19-07-24 10:50:07, Vlastimil Babka wrote: > > >> > [...] > > >> >> That wouldn't mean the busy loop is a correct and supported practice. It > > >> >> would just mean it's the least bad of the bad options we have to deal with > > >> >> an allocation that's wrong but we didn't catch soon enough in the development. > > >> > > > >> > So you want to make those potential BUG_ONs hard/soft lockups (not sure > > >> > all arches have a reliable detection) instead? > > >> > > >> I'd expect on a SMP machine there's fair chance of being rescued by kswapd > > >> or other direct reclaimer. > > > > > > I would expect hard/soft lockups... Anyway, the question remains. What > > > is the preferred way to express this is not really supported scenario. > > > > The documentation and warnings? I don't think the warnings are failing us > > fundamentally. We could limit the corner cases where the are not being > > reached in case a buggy code is being executed (maybe only in some debug > > config if the checks would be too intrusive for a fast path otherwise). That > > would leave the corner cases where the buggy code is executed rarely. Maybe > > Christoph's suggestion for a build-time check would catch some of those. > > I fail to see what you are actually proposing here. I can see only two > ways > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9ecf99190ea2..d5b06ce04a0f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4391,7 +4391,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > * of any new users that actually require GFP_NOWAIT > */ > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > - goto fail; > + goto retry; > > /* > * PF_MEMALLOC request from this context is rather bizarre > > or > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9ecf99190ea2..6ca9c4d33732 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4387,11 +4387,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > */ > if (gfp_mask & __GFP_NOFAIL) { > /* > - * All existing users of the __GFP_NOFAIL are blockable, so warn > - * of any new users that actually require GFP_NOWAIT > + * All existing users of the __GFP_NOFAIL are blockable > + * otherwise we introduce a busy loop with inside the page > + * allocator from non-sleepable contexts. > */ > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > - goto fail; > + BUG_ON(!can_direct_reclaim) what about an earlier WARN_ON, for example, even before we begin to try the allocation? This will help developers find their problems even during development stage. then at the last moment, if we have to return NULL, we do BUG_ON(). a potential way to mix two poisons is that we retry for a couple of times(for example, 500ms), if we are not rescued by anyone (other direct reclamation/kswap/free), we raise BUG_ON. but it seems too complex:-) I'd rather simply raise BUG_ON. > > /* > * PF_MEMALLOC request from this context is rather bizarre > > Choose your own poison ;) > > BUILD_BUG_ON will only work for statically defined masks AFAIU (which > would catch the existing abuser) but it will not catch the code that > builds up the mask conditionaly so there needs to be a stop gap. > -- > Michal Hocko > SUSE Labs