On Sun, Aug 18, 2024 at 5:51 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > On Sun, Aug 18, 2024 at 11:48 AM Barry Song <21cnbao@xxxxxxxxx> wrote: > > > > On Sun, Aug 18, 2024 at 2:55 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > On Sat, Aug 17, 2024 at 2:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > > > > > > > > From: Barry Song <v-songbaohua@xxxxxxxx> > > > > > > > > When users allocate memory with the __GFP_NOFAIL flag, they might > > > > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind of > > > > non-blockable __GFP_NOFAIL is not supported and is pointless. If we > > > > attempt and still fail to allocate memory for these users, we have two > > > > choices: > > > > > > > > 1. We could busy-loop and hope that some other direct reclamation or > > > > kswapd rescues the current process. However, this is unreliable > > > > and could ultimately lead to hard or soft lockups, > > > > > > That can occur even if we set both __GFP_NOFAIL and > > > __GFP_DIRECT_RECLAIM, right? So, I don't believe the issue is related > > > to setting __GFP_DIRECT_RECLAIM; rather, it stems from the flawed > > > design of __GFP_NOFAIL itself. > > > > the point of GFP_NOFAIL is that it won't fail and its user won't check > > the return value. without direct_reclamation, it is sometimes impossible. > > but with direct reclamation, users constantly wait and finally they can > > So, what exactly is the difference between 'constantly waiting' and > 'busy looping'? Could you please clarify? Also, why can't we > 'constantly wait' when __GFP_DIRECT_RECLAIM is not set? I list two options in changelog 1: busy loop 2. bug_on. I am actually fine with either one. either one is better than the existing code. but returning null in the current code is definitely wrong. 1 somehow has the attempt to make __GFP_NOFAIL without direct_reclamation legal. so it is a bit suspicious going in the wrong direction. busy-loop is that you are not reclaiming memory you are not sleeping. cpu is constantly working and busy, so it might result in a lockup, either soft lockup or hard lockup. with direct_reclamation, wait is the case you can sleep. it is not holding cpu, not a busy loop. in rare case, users might end in endless wait, but it matches the doc of __GFP_NOFAIL, never return till memory is gotten (the current code is implemented in this way unless users incorrectly combine __GFP_NOFAIL with aotmic/nowait etc.) note, long-term we won't expose __GFP_NOFAIL any more. we will only expose GFP_NOFAIL which enforces Blockable. I am quite busy on other issues, so this won't happen in a short time. > > > get memory. if you read the doc of __GFP_NOFAIL you will find it. > > it is absolutely clearly documented. > > > > > > > > > which might not > > > > be well supported by some architectures. > > > > > > > > 2. We could use BUG_ON to trigger a reliable system crash, avoiding > > > > exposing NULL dereference. > > > > > > > > Neither option is ideal, but both are improvements over the existing code. > > > > This patch selects the second option because, with the introduction of > > > > scoped API and GFP_NOFAIL—capable of enforcing direct reclamation for > > > > nofail users(which is in my plan), non-blockable nofail allocations will > > > > no longer be possible. > > > > > > > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > > > Cc: Michal Hocko <mhocko@xxxxxxxx> > > > > Cc: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > > > > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > > > > Cc: Christoph Lameter <cl@xxxxxxxxx> > > > > Cc: Pekka Enberg <penberg@xxxxxxxxxx> > > > > Cc: David Rientjes <rientjes@xxxxxxxxxx> > > > > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > > > > Cc: Vlastimil Babka <vbabka@xxxxxxx> > > > > Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx> > > > > Cc: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> > > > > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > > > Cc: Kees Cook <kees@xxxxxxxxxx> > > > > Cc: "Eugenio Pérez" <eperezma@xxxxxxxxxx> > > > > Cc: Hailong.Liu <hailong.liu@xxxxxxxx> > > > > Cc: Jason Wang <jasowang@xxxxxxxxxx> > > > > Cc: Maxime Coquelin <maxime.coquelin@xxxxxxxxxx> > > > > Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx> > > > > Cc: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > > > > --- > > > > mm/page_alloc.c | 10 +++++----- > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index d2c37f8f8d09..fb5850ecd3ae 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -4399,11 +4399,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > > */ > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > /* > > > > - * All existing users of the __GFP_NOFAIL are blockable, so warn > > > > - * of any new users that actually require GFP_NOWAIT > > > > + * All existing users of the __GFP_NOFAIL are blockable > > > > + * otherwise we introduce a busy loop with inside the page > > > > + * allocator from non-sleepable contexts > > > > */ > > > > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > > > - goto fail; > > > > + BUG_ON(!can_direct_reclaim); > > > > > > I'm not in favor of using BUG_ON() here, as many call sites already > > > handle the return value of __GFP_NOFAIL. > > > > > > > it is not correct to handle the return value of __GFP_NOFAIL. > > if you check the ret, don't use __GFP_NOFAIL. > > If so, you have many code changes to make in the linux kernel ;) > Please list those code using __GFP_NOFAIL and check the result might fail, we should get them fixed. This is insane. NOFAIL means no fail. > > > > > If we believe BUG_ON() is necessary, why not place it at the beginning > > > of __alloc_pages_slowpath() instead of after numerous operations, > > > which could potentially obscure the issue? > > > > to some extent I agree with you. but the point here is that we might > > want to avoid this check in the hot path. so basically, we check when > > we have to check. in 99%+ case, this check can be avoided. > > It's on the slow path, but that's not the main point here. I actually recommended the approach, we can do an earlier check in the hotpath. somehow, in the previous discussion, people didn't like it. > > -- > Regards > Yafang Thanks barry