On Wed, Jul 24, 2024 at 10:03 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 7/24/24 10:55 AM, Barry Song wrote: > > From: Barry Song <v-songbaohua@xxxxxxxx> > > > > We have cases we still fail though callers might have __GFP_NOFAIL. > > Since they don't check the return, we are exposed to the security > > risks for NULL deference. > > > > Though BUG_ON() is not encouraged by Linus, this is an unrecoverable > > situation. > > > > Christoph Hellwig: > > The whole freaking point of __GFP_NOFAIL is that callers don't handle > > allocation failures. So in fact a straight BUG is the right thing > > here. > > > > Vlastimil Babka: > > It's just not a recoverable situation (WARN_ON is for recoverable > > situations). The caller cannot handle allocation failure and at the same > > time asked for an impossible allocation. BUG_ON() is a guaranteed oops > > with stracktrace etc. We don't need to hope for the later NULL pointer > > dereference (which might if really unlucky happen from a different > > context where it's no longer obvious what lead to the allocation failing). > > Note that quote was meant specifically for the "too large" allocation, which > is truly impossible. That includes the kvmalloc_array() overflow, order > > MAX_ORDER etc. I equally quote this for two cases because non-sleepable is also returning NULL, in this means, they are currently facing the same problems. > > The "can't sleep/reclaim" is a bit more nuanced as there's the alternative > in just warning and looping and hoping kswapd or some other direct reclaimer > saves the day. If yes, great, we have a system that still works and a > warning to repor. If no, there's still a warning, but later soft/hardlockup > hits. These might be eventually worse than an immediate BUG_ON so it's not a > clear cut. At least I think these cases should be handled in two different > patches and not together. But I fully agree these two can be separated and judged separately. After more thinking, I am concerned that this issue might be difficult to be rescued, as the misuse of GFP_ATOMIC | __GFP_NOFAIL typically occurs in atomic contexts with strict time requirements. Even if some other components release memory to satisfy the one busy-looping to obtain memory, it might already be too late? > > > Michal Hocko: > > Linus tends to be against adding new BUG() calls unless the failure is > > absolutely unrecoverable (e.g. corrupted data structures etc.). I am > > not sure how he would look at simply incorrect memory allocator usage to > > blow up the kernel. Now the argument could be made that those failures > > could cause subtle memory corruptions or even be exploitable which might > > be a sufficient reason to stop them early. > > > > Cc: Michal Hocko <mhocko@xxxxxxxx> > > Cc: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > > Cc: Lorenzo Stoakes <lstoakes@xxxxxxxxx> > > Cc: Christoph Lameter <cl@xxxxxxxxx> > > Cc: Pekka Enberg <penberg@xxxxxxxxxx> > > Cc: David Rientjes <rientjes@xxxxxxxxxx> > > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > > Cc: Vlastimil Babka <vbabka@xxxxxxx> > > Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx> > > Cc: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> > > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > Cc: Kees Cook <kees@xxxxxxxxxx> > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > include/linux/slab.h | 4 +++- > > mm/page_alloc.c | 10 +++++----- > > mm/util.c | 1 + > > 3 files changed, 9 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/slab.h b/include/linux/slab.h > > index c9cb42203183..4a4d1fdc2afe 100644 > > --- a/include/linux/slab.h > > +++ b/include/linux/slab.h > > @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) > > { > > size_t bytes; > > > > - if (unlikely(check_mul_overflow(n, size, &bytes))) > > + if (unlikely(check_mul_overflow(n, size, &bytes))) { > > + BUG_ON(flags & __GFP_NOFAIL); > > return NULL; > > + } > > > > return kvmalloc_node_noprof(bytes, flags, node); > > } > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 45d2f41b4783..4d6af00fccd4 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -4435,11 +4435,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > */ > > if (gfp_mask & __GFP_NOFAIL) { > > /* > > - * All existing users of the __GFP_NOFAIL are blockable, so warn > > - * of any new users that actually require GFP_NOWAIT > > + * All existing users of the __GFP_NOFAIL are blockable > > + * otherwise we introduce a busy loop with inside the page > > + * allocator from non-sleepable contexts > > */ > > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) > > - goto fail; > > + BUG_ON(!can_direct_reclaim); > > > > /* > > * PF_MEMALLOC request from this context is rather bizarre > > @@ -4470,7 +4470,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > cond_resched(); > > goto retry; > > } > > -fail: > > + > > warn_alloc(gfp_mask, ac->nodemask, > > "page allocation failure: order:%u", order); > > got_pg: > > diff --git a/mm/util.c b/mm/util.c > > index 0ff5898cc6de..a1be50c243f1 100644 > > --- a/mm/util.c > > +++ b/mm/util.c > > @@ -668,6 +668,7 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) > > /* Don't even allow crazy sizes */ > > if (unlikely(size > INT_MAX)) { > > WARN_ON_ONCE(!(flags & __GFP_NOWARN)); > > + BUG_ON(flags & __GFP_NOFAIL); > > return NULL; > > } > > >