On Thu, Nov 03, 2022 at 05:11:04PM -0700, Dmitry Vyukov wrote: > On Thu, 3 Nov 2022 at 13:07, 'Jason Gunthorpe' via syzkaller-bugs > <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote: > > > > On Thu, Nov 03, 2022 at 08:00:25PM +0000, Matthew Wilcox wrote: > > > On Thu, Nov 03, 2022 at 04:09:04PM -0300, Jason Gunthorpe wrote: > > > > Hi All, > > > > > > > > I wonder if anyone has some thoughts on this - I have spent some time > > > > setting up syzkaller for a new subsystem and I've noticed that nth > > > > fault injection does not reliably cause things like xa_store() to > > > > fail. > > Hi Jason, Matthew, > > Interesting. Where exactly is that kmalloc sequence? xa_store() itself > does not have any allocations: > https://elixir.bootlin.com/linux/v6.1-rc3/source/lib/xarray.c#L1577 The first effort is this call chain __xa_store() xas_store() xas_create() xas_alloc() kmem_cache_alloc_lru(GFP_NOWAIT | __GFP_NOWARN) If that fails then __xa_store() will do: __xa_store() __xas_nomem() xas_unlock_type(xas, lock_type); kmem_cache_alloc_lru(GFP_KERNEL); xas_lock_type(xas, lock_type); They key point being that the retry is structured in a way that allows dropping the spinlocks that are forcing the first allocation to be atomic. > Do we know how common/useful such an allocation pattern is? I have coded something like this a few times, in my cases it is usually something like: try to allocate a big chunk of memory hoping for a huge page, then fall back to a smaller allocation Most likely the key consideration is that the callsites are using GFP_NOWARN, so perhaps we can just avoid decrementing the nth on a NOWARN case assuming that another allocation attempt will closely follow? > If it's common/useful, then it can be turned into a single kmalloc() > with some special flag that will try both allocation modes at once. A single call doesn't really suit the use cases.. > Potentially fail-nth interface can be extended to accept a set of > sites, e.g. "5,7" or, "5-100". For my purposes this is possibly Ok, you'd just set N->large and step N to naively cover the error paths. However, this would also have to fix the obnoxious behavior of fail nth where it fails its own copy_from_user on its write system call - meaning there would be no way to turn it off. > > > Hahaha. I didn't intentionally set out to thwart memory allocation > > > fault injection. Realistically, do we want it to fail though? > > > GFP_KERNEL allocations of small sizes are supposed to never fail. > > > (for those not aware, node allocations are 576 bytes; typically the slab > > > allocator bundles 28 of them into an order-2 allocation). > > I hear this statement periodically. But I can't understand its > status. We discussed it recently here: I was thinking about this after, and at least for what I am doing it doesn't apply. All the allocations here are GFP_KERNEL_ACCOUNT and the cgroup can definitely reject any allocation at any time even if it is "small" So I can't ignore allocation failures as something that is unlikely. A hostile userspace contained in a cgroup sandbox can reliably trigger them at will. Jason