Re: xarray, fault injection and syzkaller

Jason Gunthorpe <jgg@xxxxxxxxxx> · Thu, 3 Nov 2022 21:21:29 -0300

On Thu, Nov 03, 2022 at 05:11:04PM -0700, Dmitry Vyukov wrote:
> On Thu, 3 Nov 2022 at 13:07, 'Jason Gunthorpe' via syzkaller-bugs
> <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Nov 03, 2022 at 08:00:25PM +0000, Matthew Wilcox wrote:
> > > On Thu, Nov 03, 2022 at 04:09:04PM -0300, Jason Gunthorpe wrote:
> > > > Hi All,
> > > >
> > > > I wonder if anyone has some thoughts on this - I have spent some time
> > > > setting up syzkaller for a new subsystem and I've noticed that nth
> > > > fault injection does not reliably cause things like xa_store() to
> > > > fail.
> 
> Hi Jason, Matthew,
> 
> Interesting. Where exactly is that kmalloc sequence? xa_store() itself
> does not have any allocations:
> https://elixir.bootlin.com/linux/v6.1-rc3/source/lib/xarray.c#L1577

The first effort is this call chain

__xa_store()
  xas_store()
    xas_create()
     xas_alloc()
      kmem_cache_alloc_lru(GFP_NOWAIT | __GFP_NOWARN)

If that fails then __xa_store() will do:

__xa_store()
  __xas_nomem()
      xas_unlock_type(xas, lock_type);
      kmem_cache_alloc_lru(GFP_KERNEL);
      xas_lock_type(xas, lock_type);

They key point being that the retry is structured in a way that allows
dropping the spinlocks that are forcing the first allocation to be
atomic.

> Do we know how common/useful such an allocation pattern is?

I have coded something like this a few times, in my cases it is
usually something like: try to allocate a big chunk of memory hoping
for a huge page, then fall back to a smaller allocation

Most likely the key consideration is that the callsites are using
GFP_NOWARN, so perhaps we can just avoid decrementing the nth on a
NOWARN case assuming that another allocation attempt will closely
follow?

> If it's common/useful, then it can be turned into a single kmalloc()
> with some special flag that will try both allocation modes at once.

A single call doesn't really suit the use cases..

> Potentially fail-nth interface can be extended to accept a set of
> sites, e.g. "5,7" or, "5-100".

For my purposes this is possibly Ok, you'd just set N->large and step
N to naively cover the error paths.

However, this would also have to fix the obnoxious behavior of fail
nth where it fails its own copy_from_user on its write system call -
meaning there would be no way to turn it off.

> > > Hahaha.  I didn't intentionally set out to thwart memory allocation
> > > fault injection.  Realistically, do we want it to fail though?
> > > GFP_KERNEL allocations of small sizes are supposed to never fail.
> > > (for those not aware, node allocations are 576 bytes; typically the slab
> > > allocator bundles 28 of them into an order-2 allocation).
> 
> I hear this statement periodically. But I can't understand its
> status. We discussed it recently here:

I was thinking about this after, and at least for what I am doing it
doesn't apply. All the allocations here are GFP_KERNEL_ACCOUNT and the
cgroup can definitely reject any allocation at any time even if it is
"small"

So I can't ignore allocation failures as something that is unlikely. A
hostile userspace contained in a cgroup sandbox can reliably trigger
them at will.

Jason