Re: [PATCH v3 3/4] mm: BUG_ON to avoid NULL deference while __GFP_NOFAIL fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 20, 2024 at 12:33 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 19.08.24 12:02, Barry Song wrote:
> > On Mon, Aug 19, 2024 at 9:55 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 19.08.24 11:47, Barry Song wrote:
> >>> On Mon, Aug 19, 2024 at 9:43 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 17.08.24 08:24, Barry Song wrote:
> >>>>> From: Barry Song <v-songbaohua@xxxxxxxx>
> >>>>>
> >>>>> We have cases we still fail though callers might have __GFP_NOFAIL.  Since
> >>>>> they don't check the return, we are exposed to the security risks for NULL
> >>>>> deference.
> >>>>>
> >>>>> Though BUG_ON() is not encouraged by Linus, this is an unrecoverable
> >>>>> situation.
> >>>>>
> >>>>> Christoph Hellwig:
> >>>>> The whole freaking point of __GFP_NOFAIL is that callers don't handle
> >>>>> allocation failures.  So in fact a straight BUG is the right thing
> >>>>> here.
> >>>>>
> >>>>> Vlastimil Babka:
> >>>>> It's just not a recoverable situation (WARN_ON is for recoverable
> >>>>> situations). The caller cannot handle allocation failure and at the same
> >>>>> time asked for an impossible allocation. BUG_ON() is a guaranteed oops
> >>>>> with stracktrace etc. We don't need to hope for the later NULL pointer
> >>>>> dereference (which might if really unlucky happen from a different
> >>>>> context where it's no longer obvious what lead to the allocation failing).
> >>>>>
> >>>>> Michal Hocko:
> >>>>> Linus tends to be against adding new BUG() calls unless the failure is
> >>>>> absolutely unrecoverable (e.g. corrupted data structures etc.). I am
> >>>>> not sure how he would look at simply incorrect memory allocator usage to
> >>>>> blow up the kernel. Now the argument could be made that those failures
> >>>>> could cause subtle memory corruptions or even be exploitable which might
> >>>>> be a sufficient reason to stop them early.
> >>>>>
> >>>>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
> >>>>> Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> >>>>> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
> >>>>> Acked-by: Michal Hocko <mhocko@xxxxxxxx>
> >>>>> Cc: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> >>>>> Cc: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> >>>>> Cc: Christoph Lameter <cl@xxxxxxxxx>
> >>>>> Cc: Pekka Enberg <penberg@xxxxxxxxxx>
> >>>>> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> >>>>> Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> >>>>> Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx>
> >>>>> Cc: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> >>>>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> >>>>> Cc: Kees Cook <kees@xxxxxxxxxx>
> >>>>> Cc: "Eugenio Pérez" <eperezma@xxxxxxxxxx>
> >>>>> Cc: Hailong.Liu <hailong.liu@xxxxxxxx>
> >>>>> Cc: Jason Wang <jasowang@xxxxxxxxxx>
> >>>>> Cc: Maxime Coquelin <maxime.coquelin@xxxxxxxxxx>
> >>>>> Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx>
> >>>>> Cc: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
> >>>>> ---
> >>>>>     include/linux/slab.h | 4 +++-
> >>>>>     mm/page_alloc.c      | 4 +++-
> >>>>>     mm/util.c            | 1 +
> >>>>>     3 files changed, 7 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
> >>>>> index c9cb42203183..4a4d1fdc2afe 100644
> >>>>> --- a/include/linux/slab.h
> >>>>> +++ b/include/linux/slab.h
> >>>>> @@ -827,8 +827,10 @@ kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node)
> >>>>>     {
> >>>>>         size_t bytes;
> >>>>>
> >>>>> -     if (unlikely(check_mul_overflow(n, size, &bytes)))
> >>>>> +     if (unlikely(check_mul_overflow(n, size, &bytes))) {
> >>>>> +             BUG_ON(flags & __GFP_NOFAIL);
> >>>>>                 return NULL;
> >>>>> +     }
> >>>>>
> >>>>>         return kvmalloc_node_noprof(bytes, flags, node);
> >>>>>     }
> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>>> index 60742d057b05..d2c37f8f8d09 100644
> >>>>> --- a/mm/page_alloc.c
> >>>>> +++ b/mm/page_alloc.c
> >>>>> @@ -4668,8 +4668,10 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
> >>>>>          * There are several places where we assume that the order value is sane
> >>>>>          * so bail out early if the request is out of bound.
> >>>>>          */
> >>>>> -     if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
> >>>>> +     if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) {
> >>>>> +             BUG_ON(gfp & __GFP_NOFAIL);
> >>>>>                 return NULL;
> >>>>> +     }
> >>>>>
> >>>>>         gfp &= gfp_allowed_mask;
> >>>>>         /*
> >>>>> diff --git a/mm/util.c b/mm/util.c
> >>>>> index ac01925a4179..678c647b778f 100644
> >>>>> --- a/mm/util.c
> >>>>> +++ b/mm/util.c
> >>>>> @@ -667,6 +667,7 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
> >>>>>
> >>>>>         /* Don't even allow crazy sizes */
> >>>>>         if (unlikely(size > INT_MAX)) {
> >>>>> +             BUG_ON(flags & __GFP_NOFAIL);
> >>>>
> >>>> No new BUG_ON please. WARN_ON_ONCE() + recovery code might be suitable here.
> >>>
> >>> Hi David,
> >>> WARN_ON_ONCE()  might be fine but I don't see how it is possible to recover.
> >>
> >> Just return NULL? "shit in shit out" :) ?
> >
> > Returning NULL is perfectly right if gfp doesn't include __GFP_NOFAIL,
> > as it's the caller's responsibility to check the return value. However, with
> > __GFP_NOFAIL, users will directly dereference *(p + offset) even when
> > p == NULL. It is how __GFP_NOFAIL is supposed to work.
>
> If the caller is not supposed to pass that flag combination (shit in),
> we are not obligated to give a reasonable result (shit out).
>
> My point is that we should let the caller (possibly?) crash -- the one
> that did something that is wrong -- instead of forcing a crash using
> BUG_ON in this code here.
>
> It should all be caught during testing either way. And if some OOT
> module does something nasty, that's not our responsibility.
>
> BUG_ON is not a way to write assertions into the code.

It seems there was a misunderstanding regarding the purpose of
this change. we actually have many details in changelog.

Its aim is not to write an assertion, but rather to prevent exposing
a security vulnerability.

Returning NULL doesn't necessarily crash the caller's process, p->field,
*(p + offset) deference could be used by hackers to exploit the system.

>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux