Re: xarray, fault injection and syzkaller

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Fri, 4 Nov 2022 10:47:17 -0700

 iOn Thu, 3 Nov 2022 at 17:21, 'Jason Gunthorpe' via syzkaller-bugs
<syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Nov 03, 2022 at 05:11:04PM -0700, Dmitry Vyukov wrote:
> > On Thu, 3 Nov 2022 at 13:07, 'Jason Gunthorpe' via syzkaller-bugs
> > <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Nov 03, 2022 at 08:00:25PM +0000, Matthew Wilcox wrote:
> > > > On Thu, Nov 03, 2022 at 04:09:04PM -0300, Jason Gunthorpe wrote:
> > > > > Hi All,
> > > > >
> > > > > I wonder if anyone has some thoughts on this - I have spent some time
> > > > > setting up syzkaller for a new subsystem and I've noticed that nth
> > > > > fault injection does not reliably cause things like xa_store() to
> > > > > fail.
> >
> > Hi Jason, Matthew,
> >
> > Interesting. Where exactly is that kmalloc sequence? xa_store() itself
> > does not have any allocations:
> > https://elixir.bootlin.com/linux/v6.1-rc3/source/lib/xarray.c#L1577
>
> The first effort is this call chain
>
> __xa_store()
>   xas_store()
>     xas_create()
>      xas_alloc()
>       kmem_cache_alloc_lru(GFP_NOWAIT | __GFP_NOWARN)
>
> If that fails then __xa_store() will do:
>
> __xa_store()
>   __xas_nomem()
>       xas_unlock_type(xas, lock_type);
>       kmem_cache_alloc_lru(GFP_KERNEL);
>       xas_lock_type(xas, lock_type);
>
> They key point being that the retry is structured in a way that allows
> dropping the spinlocks that are forcing the first allocation to be
> atomic.

I see. Yes, as you note below, this cannot be folded into a single
allocation call.

> > Do we know how common/useful such an allocation pattern is?
>
> I have coded something like this a few times, in my cases it is
> usually something like: try to allocate a big chunk of memory hoping
> for a huge page, then fall back to a smaller allocation
>
> Most likely the key consideration is that the callsites are using
> GFP_NOWARN, so perhaps we can just avoid decrementing the nth on a
> NOWARN case assuming that another allocation attempt will closely
> follow?

GFP_NOWARN is also extensively used for allocations with
user-controlled size, e.g.:
https://elixir.bootlin.com/linux/v6.1-rc3/source/net/unix/af_unix.c#L3451

That's different and these allocations are usually not repeated.
So looking at GFP_NOWARN does not look like the right thing to do.

> > If it's common/useful, then it can be turned into a single kmalloc()
> > with some special flag that will try both allocation modes at once.
>
> A single call doesn't really suit the use cases..
>
> > Potentially fail-nth interface can be extended to accept a set of
> > sites, e.g. "5,7" or, "5-100".
>
> For my purposes this is possibly Ok, you'd just set N->large and step
> N to naively cover the error paths.

Filed https://bugzilla.kernel.org/show_bug.cgi?id=216661 for this.

> However, this would also have to fix the obnoxious behavior of fail
> nth where it fails its own copy_from_user on its write system call -
> meaning there would be no way to turn it off.

Oh, interesting. We added failing of copy_from/to_user later and did
not consider such interaction.
Filed https://bugzilla.kernel.org/show_bug.cgi?id=216660 for this.

> > > > Hahaha.  I didn't intentionally set out to thwart memory allocation
> > > > fault injection.  Realistically, do we want it to fail though?
> > > > GFP_KERNEL allocations of small sizes are supposed to never fail.
> > > > (for those not aware, node allocations are 576 bytes; typically the slab
> > > > allocator bundles 28 of them into an order-2 allocation).
> >
> > I hear this statement periodically. But I can't understand its
> > status. We discussed it recently here:
>
> I was thinking about this after, and at least for what I am doing it
> doesn't apply. All the allocations here are GFP_KERNEL_ACCOUNT and the
> cgroup can definitely reject any allocation at any time even if it is
> "small"
>
> So I can't ignore allocation failures as something that is unlikely. A
> hostile userspace contained in a cgroup sandbox can reliably trigger
> them at will.
>
> Jason
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@xxxxxxxxxxxxxxxx.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/Y2RbCUdEY2syxRLW%40nvidia.com.