On Thu 15-08-24 11:26:09, Yafang Shao wrote: > On Wed, Aug 14, 2024 at 8:43 PM Michal Hocko <mhocko@xxxxxxxx> wrote: [...] > > > If that's the case, I believe we should at least consider adding the > > > following code change to the kernel: > > > > We already do have that > > /* > > * All existing users of the __GFP_NOFAIL are blockable, so warn > > * of any new users that actually require GFP_NOWAIT > > */ > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > goto fail; > > I don't see a reason to place the `goto fail;` above the > `__alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);` > line. Since we've already woken up kswapd, it should be acceptable to > allocate memory from ALLOC_MIN_RESERVE temporarily. Why not consider > implementing the following changes instead? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9ecf99190ea2..598d4df829cd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4386,13 +4386,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned > int order, > * we always retry > */ > if (gfp_mask & __GFP_NOFAIL) { > - /* > - * All existing users of the __GFP_NOFAIL are blockable, so warn > - * of any new users that actually require GFP_NOWAIT > - */ > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > - goto fail; > - > /* > * PF_MEMALLOC request from this context is rather bizarre > * because we cannot reclaim anything and only can loop waiting > @@ -4419,6 +4412,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned > int order, > if (page) > goto got_pg; > > + /* > + * All existing users of the __GFP_NOFAIL are blockable, so warn > + * of any new users that actually require GFP_NOWAIT > + */ > + if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) { > + goto fail; > + } > + > cond_resched(); > goto retry; > } How does this solve anything. It will still eventually fail the NOFAIL allocation. It might happen slightly later but that doesn't change the fact it will _fail_. I have referenced a discussion why that is not really desireable and why Barry wants that addressed. We have added that WARN_ON_ONCE because we have assumed that people do understand that NOFAIL without reclaim is just too much to ask. We were wrong there was one user in the kernel. That one was not too hard to find out because you can _grep_ for those flags. Scoped APIs make that impossible! > > But Barry has patches to turn that into BUG because failing NOFAIL > > allocations is not cool and cause unexpected failures. Have a look at > > https://lore.kernel.org/all/20240731000155.109583-1-21cnbao@xxxxxxxxx/ > > > > > > I am really > > > > surprised that we even have PF_MEMALLOC_NORECLAIM in the first place! > > > > > > There's use cases for it. > > > > Right but there are certain constrains that we need to worry about to > > have a maintainable code. Scope allocation contrains are really a good > > feature when that has a well defined semantic. E.g. NOFS, NOIO or > > NOMEMALLOC (although this is more self inflicted injury exactly because > > PF_MEMALLOC had a "use case"). NOWAIT scope semantic might seem a good > > feature but it falls appart on nested NOFAIL allocations! So the flag is > > usable _only_ if you fully control the whole scoped context. Good luck > > with that long term! This is fragile, hard to review and even harder to > > keep working properly. The flag would have been Nacked on that ground. > > But nobody asked... > > It's already implemented, and complaints won't resolve the issue. How > about making the following change to provide a warning when this new > flag is used incorrectly? How does this solve anything at all? It will warn you that your code is incorrect and what next? Are you going to remove GFP_NOFAIL from the nested allocation side? NOFAIL is a strong requirement and it is not used nilly willy. There must have been a very good reason to use it. Are you going to drop the scope? Let me repeat, nested NOFAIL allocations will BUG_ON on failure. Your warning might catch those users slightly earlier when allocation succeed but that doesn't make those crashes impossible. PF_MEMALLOC_NORECLAIM might be already merged but this concept is inherently fragile and should be reverted rather than finding impossible ways around it. And it should be done before this spreads outside of bcachefs. -- Michal Hocko SUSE Labs