Re: [PATCH v3 0/4] mm: clarify nofail memory allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 30, 2024 at 1:20 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Thu 29-08-24 23:53:33, Barry Song wrote:
> > On Thu, Aug 29, 2024 at 10:24 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> > >
> > > On 8/27/24 09:50, Barry Song wrote:
> > > > On Tue, Aug 27, 2024 at 7:38 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> > > >>
> > > >>
> > > >> Ugh, wasn't aware, well spotted. So it means there at least shouldn't be
> > > >> existing users of __GFP_NOFAIL with order > 1 :)
> > > >>
> > > >> But also the check is in the hotpath, even before trying the pcplists, so we
> > > >> could move it to __alloc_pages_slowpath() while extending it?
> > > >
> > > > Agreed. I don't think it is reasonable to check the order and flags in
> > > > two different places especially rmqueue() has already had
> > > > gfp_flags & __GFP_NOFAIL operation and order > 1
> > > > overhead.
> > > >
> > > > We can at least extend the current check to make some improvement
> > > > though I still believe Michal's suggestion of implementing OOPS_ON is a
> > > > better approach to pursue, as it doesn't crash the entire system
> > > > while ensuring the problematic process is terminated.
> > >
> > > Linus made clear it's not a mm concern. If e.g. hardening people want to
> > > pursuit that instead, they can.
> > >
> > > BTW I think BUG_ON already works like this, if possible only the calling
> > > process is terminated. panic happens in case of being in a irq context, or
> >
> > you are right. This is a detail I overlooked in the last discussion.
> > BUG_ON has already been exactly the case to only terminate the bad
> > process if it can
> > (panic_on_oops=N and not in irq context).
>
> Are you sure about that? Maybe x86 implementation treats BUG as oops but
> is this what that does on all arches? BUG() has historically meant stop
> everything and die and I am not really sure when that would have
> changed TBH.

My ARM64 machine also only terminates the bad process by BUG_ON()
if we are not in irq and we don't set panic_on_oops.

I guess it depends on HAVE_ARCH_BUG? if arch has no BUG(), BUG()
will be just a panic ?

#ifndef HAVE_ARCH_BUG
#define BUG() do { \
        printk("BUG: failure at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \
        barrier_before_unreachable(); \
        panic("BUG!"); \
} while (0)
#endif

I assume it is equally difficult to implement OOPS_ON() if arch lacks
HAVE_ARCH_BUG ?

"grep" shows the most mainstream archs have their own HAVE_ARCH_BUG:

$ git grep HAVE_ARCH_BUG
arch/alpha/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/arc/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/arm/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/arm64/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/csky/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/loongarch/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/m68k/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/mips/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/mips/include/asm/bug.h:#define HAVE_ARCH_BUG_ON
arch/parisc/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/powerpc/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/powerpc/include/asm/bug.h:#define HAVE_ARCH_BUG_ON
arch/riscv/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/s390/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/sh/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/sparc/include/asm/bug.h:#define HAVE_ARCH_BUG
arch/x86/include/asm/bug.h:#define HAVE_ARCH_BUG

>
> > > due to panic_on_oops. Which the security people are setting to 1 anyway and
> > > OOPS_ON would have to observe it too. So AFAICS the only difference from
> > > BUG_ON would be not panic in the irq context, if panic_on_oops isn't set.
> >
> > right.
> >
> > > (as for "no mm locks held" I think it's already satisfied at the points we
> > > check for __GFP_NOFAIL).
> >
> > Let me summarize the discussion:
> >
> > Patch 1/4, which fixes the misuse of combining gfp_nofail and atomic
> > in vdpa driver, is necessary.
> > Patch 2/4, which updates the documentation to clarify that
> > non-blockable gfp_nofail is not
> >                   supported, is needed.
>
> Let's please have those merged now.
>
> > Patch 3/4: We will replace BUG_ON with WARN_ON_ONCE to warn when the
> > size is too large,
> >                  where gfp_nofail will return NULL.
>
>
> I would pull this one out for a separate discussion. We should really
> define what the too large really means and INT_MAX etc. is not it at
> all.

make sense.

>
> > Patch 4/4: We will move the order > 1 check from the current fast path
> > to the slow path and extend
> >                  the check of gfp_direct_reclaim flag also in the slow path.
>
> OK, let's have that go in now as well.
>
> --
> Michal Hocko
> SUSE Labs

Thanks
Barry





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux