Re: xarray, fault injection and syzkaller

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Fri, 4 Nov 2022 11:21:21 -0700

On Fri, 4 Nov 2022 at 11:03, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> On Fri, Nov 04, 2022 at 10:47:17AM -0700, Dmitry Vyukov wrote:
> > > > Do we know how common/useful such an allocation pattern is?
> > >
> > > I have coded something like this a few times, in my cases it is
> > > usually something like: try to allocate a big chunk of memory hoping
> > > for a huge page, then fall back to a smaller allocation
> > >
> > > Most likely the key consideration is that the callsites are using
> > > GFP_NOWARN, so perhaps we can just avoid decrementing the nth on a
> > > NOWARN case assuming that another allocation attempt will closely
> > > follow?
> >
> > GFP_NOWARN is also extensively used for allocations with
> > user-controlled size, e.g.:
> > https://elixir.bootlin.com/linux/v6.1-rc3/source/net/unix/af_unix.c#L3451
> >
> > That's different and these allocations are usually not repeated.
> > So looking at GFP_NOWARN does not look like the right thing to do.
>
> This may be the best option then, arguably perhaps even more
> "realistic" than normal fail_nth as in a real system if this stuff
> starts failing there is a good chance things from then on will fail
> too during the error cleanup.
>
> > > However, this would also have to fix the obnoxious behavior of fail
> > > nth where it fails its own copy_from_user on its write system call -
> > > meaning there would be no way to turn it off.
> >
> > Oh, interesting. We added failing of copy_from/to_user later and did
> > not consider such interaction.
> > Filed https://bugzilla.kernel.org/show_bug.cgi?id=216660 for this.
>
> Oh, I will tell you the other two bugish things I noticed
>
> __should_failslab() has this:
>
>         if (gfpflags & __GFP_NOWARN)
>                 failslab.attr.no_warn = true;
>
>         return should_fail(&failslab.attr, s->object_size);
>
> Which always permanently turns off no_warn for slab during early
> boot. This is why syzkaller reports are so confusing. They trigger a
> slab fault injection, which in all other cases gives a notification
> backtrace, but in slab cases there is no hint about the fault
> injection in the log.

Ouch, this looks like a bug in:

commit 3f913fc5f9745613088d3c569778c9813ab9c129
Author: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
Date:   Thu May 19 14:08:55 2022 -0700
     mm: fix missing handler for __GFP_NOWARN

+Qi could you please fix it?

At the very least the local gfpflags should not alter the global
failslab.attr that is persistent and shared by all tasks.

But I am not sure if we really don't want to issue the fault injection
stack in this case. It's not a WARNING, it's merely an information
message. It looks useful in all cases, even with GFP_NOWARN. Why
should it be suppressed?

> Once that is fixed we can quickly explain why the socketpair() example
> in the docs shows success ret codes in the middle of the sweep when
> run on syzkaller kernels
>
> fail_nth interacts badly with other kernel features typically enabled
> in syzkaller kernels. Eg it fails in hidden kmemleak instrumentation:
>
> [   18.499559] FAULT_INJECTION: forcing a failure.
> [   18.499559] name failslab, interval 1, probability 0, space 0, times 0
> [   18.499720] CPU: 10 PID: 386 Comm: iommufd_fail_nt Not tainted 6.1.0-rc3+ #34
> [   18.499826] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> [   18.499971] Call Trace:
> [   18.500010]  <TASK>
> [   18.500048]  show_stack+0x3d/0x3f
> [   18.500114]  dump_stack_lvl+0x92/0xbd
> [   18.500171]  dump_stack+0x15/0x17
> [   18.500232]  should_fail.cold+0x5/0xa
> [   18.500291]  __should_failslab+0xb6/0x100
> [   18.500349]  should_failslab+0x9/0x20
> [   18.500416]  kmem_cache_alloc+0x64/0x4e0
> [   18.500477]  ? __create_object+0x40/0xc50
> [   18.500539]  __create_object+0x40/0xc50
> [   18.500620]  ? kasan_poison+0x3a/0x50
> [   18.500690]  ? kasan_unpoison+0x28/0x50
> [***18.500753]  kmemleak_alloc+0x24/0x30
> [   18.500816]  __kmem_cache_alloc_node+0x1de/0x400
> [   18.500900]  ? iopt_alloc_area_pages+0x95/0x560 [iommufd]
> [   18.500993]  kmalloc_trace+0x26/0x110
> [   18.501059]  iopt_alloc_area_pages+0x95/0x560 [iommufd]
>
> Which has the consequence of syzkaller wasting half its fail_nth
> effort because it is triggering failures in hidden instrumentation
> that has no impact on the main code path.
>
> Maybe a kmem_cache_alloc_no_fault_inject() would be helpful for a few
> cases.
>
> Jason