Re: [PATCH v3 0/4] mm: clarify nofail memory allocation

Barry Song <21cnbao@xxxxxxxxx> · Tue, 20 Aug 2024 07:23:31 +1200

On Tue, Aug 20, 2024 at 4:05 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, 19 Aug 2024 at 06:02, David Hildenbrand <david@xxxxxxxxxx> wrote:
> > >
> > > If we must still fail a nofail allocation, we should trigger a BUG rather
> > > than exposing NULL dereferences to callers who do not check the return
> > > value.
> >
> > I am not convinced that BUG_ON is the right tool here to save the world,
> > but I see how we arrived here.
>
> I think the thing to do is to just add a
>
>      WARN_ON_ONCE((flags & __GFP_NOFAIL) && bad_nofail_alloc(oder, flags));
>
> or similar, where that bad_nofail_alloc() checks that the allocation
> order is small and that the flags are sane for a NOFAIL allocation.
>
> Because no, BUG_ON() is *never* the answer. The answer is to make sure
> nobody ever sets NOFAIL in situations where the allocation can fail
> and there is no way forward.
>
> A BUG_ON() will quite likely just make things worse. You're better off
> with a WARN_ON() and letting the caller just oops.

That could be an exploit taking advantage of those improper callers,
thus it wouldn’t necessarily result in an immediate oops in callers but
result in an exploit such as

p = malloc(gfp_nofail);

using p->field
or
using p + offset

where p->filed or p + offset might be attacking code though p is NULL.

Could we consider infinitely blocking those bad callers:
while (illegal_gfp_nofail)  do_sth_relax_cpu();

>
> Honestly, I'm perfectly fine with just removing that stupid useless
> flag entirely. The flag goes back to 2003 and was introduced in
> 2.5.69, and was meant to be for very particular uses that otherwise
> just looped waiting for memory.
>
> Back in 2.5.69, there was exactly one user: the jbd journal code, that
> did a buffer head allocation with GFP_NOFAIL.  By 2.6.0 that had
> expanded by another user in XFS, and even that one had a comment
> saying that it needed to be narrowed down. And in fact, by the 2.6.12
> release, that XFS use had been removed, but the jbd journal had grown
> another jbd_kmalloc case for transaction data. So at the beginning of
> the git archives, we had exactly *one* user (with two places).
>
> *THAT* is the kind of use that the flag was meant for: small
> allocations required to make forward progress in writeout during
> memory pressure.
>
> It has then expanded and is now a problem. The cases using GFP_NOFAIL
> for things like vmalloc() - which is by definition not a small
> allocation - should be just removed as outright bugs.
>
> Note that we had this comment back in 2010:
>
>  * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
>  * cannot handle allocation failures.  This modifier is deprecated and no new
>  * users should be added.
>
> and then it was softened in 2015 to the current
>
>  * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
>  * cannot handle allocation failures. New users should be evaluated carefully
>   ...
>
> and clearly that "evaluated carefully" actually never happened, so the
> new comment is just garbage.
>
> I wonder how many modern users of GFP_NOFAIL are simply due to
> over-eager allocation failure injection testing, and then people added
> GFP_NOFAIL just because it shut up the mindless random allocation
> failures.
>
> I mean, we have a __GFP_NOFAIL in rhashtable_init() - which can
> actually return an error just fine, but there was this crazy worry
> about the IPC layer initialization failing:
>
>    https://lore.kernel.org/all/20180523172500.anfvmjtumww65ief@linux-n805/
>
> Things like that, where people just added mindless "theoretical
> concerns" issues, or possibly had some error injection module that
> inserted impossible failures.
>
> I do NOT want those things to become BUG_ON()'s. It's better to just
> return NULL with a "bogus GFP_NOFAIL" warning, and have the oops
> happen in the actual bad place that did an invalid allocation.
>
> Because the blame should go *there*, and it should not even remotely
> look like "oh, the MM code failed". No. The caller was garbage.
>
> So no. No MM BUG_ON code.
>
>                     Linus