On 7/24/24 10:55 AM, Barry Song wrote: > From: Barry Song <v-songbaohua@xxxxxxxx> > > We have cases we still fail though callers might have __GFP_NOFAIL. > Since they don't check the return, we are exposed to the security > risks for NULL deference. > > Though BUG_ON() is not encouraged by Linus, this is an unrecoverable > situation. > > Christoph Hellwig: > The whole freaking point of __GFP_NOFAIL is that callers don't handle > allocation failures. So in fact a straight BUG is the right thing > here. > > Vlastimil Babka: > It's just not a recoverable situation (WARN_ON is for recoverable > situations). The caller cannot handle allocation failure and at the same > time asked for an impossible allocation. BUG_ON() is a guaranteed oops > with stracktrace etc. We don't need to hope for the later NULL pointer > dereference (which might if really unlucky happen from a different > context where it's no longer obvious what lead to the allocation failing). Note that quote was meant specifically for the "too large" allocation, which is truly impossible. That includes the kvmalloc_array() overflow, order > MAX_ORDER etc. The "can't sleep/reclaim" is a bit more nuanced as there's the alternative in just warning and looping and hoping kswapd or some other direct reclaimer saves the day. If yes, great, we have a system that still works and a warning to repor. If no, there's still a warning, but later soft/hardlockup hits. These might be eventually worse than an immediate BUG_ON so it's not a clear cut. At least I think these cases should be handled in two different patches and not together. > Michal Hocko: > Linus tends to be against adding new BUG() calls unless the failure is > absolutely unrecoverable (e.g. corrupted data structures etc.). I am > not sure how he would look at simply incorrect memory allocator usage to > blow up the kernel. Now the argument could be made that those failures > could cause subtle memory corruptions or even be exploitable which might > be a sufficient reason to stop them early. > > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Cc: Lorenzo Stoakes <lstoakes@xxxxxxxxx> > Cc: Christoph Lameter <cl@xxxxxxxxx> > Cc: Pekka Enberg <penberg@xxxxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx> > Cc: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Kees Cook <kees@xxxxxxxxxx> > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > --- > include/linux/slab.h | 4 +++- > mm/page_alloc.c | 10 +++++----- > mm/util.c | 1 + > 3 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index c9cb42203183..4a4d1fdc2afe 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) > { > size_t bytes; > > - if (unlikely(check_mul_overflow(n, size, &bytes))) > + if (unlikely(check_mul_overflow(n, size, &bytes))) { > + BUG_ON(flags & __GFP_NOFAIL); > return NULL; > + } > > return kvmalloc_node_noprof(bytes, flags, node); > } > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 45d2f41b4783..4d6af00fccd4 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4435,11 +4435,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > */ > if (gfp_mask & __GFP_NOFAIL) { > /* > - * All existing users of the __GFP_NOFAIL are blockable, so warn > - * of any new users that actually require GFP_NOWAIT > + * All existing users of the __GFP_NOFAIL are blockable > + * otherwise we introduce a busy loop with inside the page > + * allocator from non-sleepable contexts > */ > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > - goto fail; > + BUG_ON(!can_direct_reclaim); > > /* > * PF_MEMALLOC request from this context is rather bizarre > @@ -4470,7 +4470,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > cond_resched(); > goto retry; > } > -fail: > + > warn_alloc(gfp_mask, ac->nodemask, > "page allocation failure: order:%u", order); > got_pg: > diff --git a/mm/util.c b/mm/util.c > index 0ff5898cc6de..a1be50c243f1 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -668,6 +668,7 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) > /* Don't even allow crazy sizes */ > if (unlikely(size > INT_MAX)) { > WARN_ON_ONCE(!(flags & __GFP_NOWARN)); > + BUG_ON(flags & __GFP_NOFAIL); > return NULL; > } >