On Thu, 3 Nov 2022 at 13:07, 'Jason Gunthorpe' via syzkaller-bugs <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote: > > On Thu, Nov 03, 2022 at 08:00:25PM +0000, Matthew Wilcox wrote: > > On Thu, Nov 03, 2022 at 04:09:04PM -0300, Jason Gunthorpe wrote: > > > Hi All, > > > > > > I wonder if anyone has some thoughts on this - I have spent some time > > > setting up syzkaller for a new subsystem and I've noticed that nth > > > fault injection does not reliably cause things like xa_store() to > > > fail. Hi Jason, Matthew, Interesting. Where exactly is that kmalloc sequence? xa_store() itself does not have any allocations: https://elixir.bootlin.com/linux/v6.1-rc3/source/lib/xarray.c#L1577 Do we know how common/useful such an allocation pattern is? If it's common/useful, then it can be turned into a single kmalloc() with some special flag that will try both allocation modes at once. Potentially fail-nth interface can be extended to accept a set of sites, e.g. "5,7" or, "5-100". Though, not sure what the systematic enumeration should be then if we go beyond "every single site on its own"... but I guess we can figure it out. > > > It seems the basic reason is that xarray will usually do two > > > allocations, one in an atomic context which fault injection does > > > reliably fail, but then it almost always follows up with a second > > > allocation in a non-atomic context that doesn't fail because nth has > > > become 0. > > > > Hahaha. I didn't intentionally set out to thwart memory allocation > > fault injection. Realistically, do we want it to fail though? > > GFP_KERNEL allocations of small sizes are supposed to never fail. > > (for those not aware, node allocations are 576 bytes; typically the slab > > allocator bundles 28 of them into an order-2 allocation). I hear this statement periodically. But I can't understand its status. We discussed it recently here: https://lore.kernel.org/all/CACT4Y+Y_kg1J00iBL=sMr5AP7U4RXuBizusvQG52few2NcJ6dg@xxxxxxxxxxxxxx/ Likely/unlikely is not what matters, right? It's only: can it fail at all or not? If some allocation can't fail 100% and we want to rely on this in future (not just a current implementation detail), then I think we need to (1) make fault injection to not fail them, add a BUG_ON in the allocation to panic if they do fail, (3) maybe start removing error handling code (since having buggy/untested/confusing code is not useful). > I don't know, I have code to handle these failures, I want to test it > :) > > > I think a simple solution if we really do want to make allocations fail > > is to switch error injection from "fail one allocation per N" to "fail > > M allocations per N". eg, 7 allocations succeed, 3 allocations fail, > > 7 succeed, 3 fail, ... It's more realistic because you do tend to see > > memory allocation failures come in bursts. > > The systemic testing I've setup just walks nth through the entire > range until it no longer triggers. This hits every injection point and > checks the failure path of it. This is also what syzkaller does > automatically from what I can tell > > If we make it probabilistic it is harder to reliably trigger these > fault points.