On Fri, 4 Nov 2022 at 11:03, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Fri, Nov 04, 2022 at 10:47:17AM -0700, Dmitry Vyukov wrote: > > > > Do we know how common/useful such an allocation pattern is? > > > > > > I have coded something like this a few times, in my cases it is > > > usually something like: try to allocate a big chunk of memory hoping > > > for a huge page, then fall back to a smaller allocation > > > > > > Most likely the key consideration is that the callsites are using > > > GFP_NOWARN, so perhaps we can just avoid decrementing the nth on a > > > NOWARN case assuming that another allocation attempt will closely > > > follow? > > > > GFP_NOWARN is also extensively used for allocations with > > user-controlled size, e.g.: > > https://elixir.bootlin.com/linux/v6.1-rc3/source/net/unix/af_unix.c#L3451 > > > > That's different and these allocations are usually not repeated. > > So looking at GFP_NOWARN does not look like the right thing to do. > > This may be the best option then, arguably perhaps even more > "realistic" than normal fail_nth as in a real system if this stuff > starts failing there is a good chance things from then on will fail > too during the error cleanup. > > > > However, this would also have to fix the obnoxious behavior of fail > > > nth where it fails its own copy_from_user on its write system call - > > > meaning there would be no way to turn it off. > > > > Oh, interesting. We added failing of copy_from/to_user later and did > > not consider such interaction. > > Filed https://bugzilla.kernel.org/show_bug.cgi?id=216660 for this. > > Oh, I will tell you the other two bugish things I noticed > > __should_failslab() has this: > > if (gfpflags & __GFP_NOWARN) > failslab.attr.no_warn = true; > > return should_fail(&failslab.attr, s->object_size); > > Which always permanently turns off no_warn for slab during early > boot. This is why syzkaller reports are so confusing. They trigger a > slab fault injection, which in all other cases gives a notification > backtrace, but in slab cases there is no hint about the fault > injection in the log. > > Once that is fixed we can quickly explain why the socketpair() example > in the docs shows success ret codes in the middle of the sweep when > run on syzkaller kernels > > fail_nth interacts badly with other kernel features typically enabled > in syzkaller kernels. Eg it fails in hidden kmemleak instrumentation: > > [ 18.499559] FAULT_INJECTION: forcing a failure. > [ 18.499559] name failslab, interval 1, probability 0, space 0, times 0 > [ 18.499720] CPU: 10 PID: 386 Comm: iommufd_fail_nt Not tainted 6.1.0-rc3+ #34 > [ 18.499826] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > [ 18.499971] Call Trace: > [ 18.500010] <TASK> > [ 18.500048] show_stack+0x3d/0x3f > [ 18.500114] dump_stack_lvl+0x92/0xbd > [ 18.500171] dump_stack+0x15/0x17 > [ 18.500232] should_fail.cold+0x5/0xa > [ 18.500291] __should_failslab+0xb6/0x100 > [ 18.500349] should_failslab+0x9/0x20 > [ 18.500416] kmem_cache_alloc+0x64/0x4e0 > [ 18.500477] ? __create_object+0x40/0xc50 > [ 18.500539] __create_object+0x40/0xc50 > [ 18.500620] ? kasan_poison+0x3a/0x50 > [ 18.500690] ? kasan_unpoison+0x28/0x50 > [***18.500753] kmemleak_alloc+0x24/0x30 > [ 18.500816] __kmem_cache_alloc_node+0x1de/0x400 > [ 18.500900] ? iopt_alloc_area_pages+0x95/0x560 [iommufd] > [ 18.500993] kmalloc_trace+0x26/0x110 > [ 18.501059] iopt_alloc_area_pages+0x95/0x560 [iommufd] > > Which has the consequence of syzkaller wasting half its fail_nth > effort because it is triggering failures in hidden instrumentation > that has no impact on the main code path. > > Maybe a kmem_cache_alloc_no_fault_inject() would be helpful for a few > cases. Filed https://bugzilla.kernel.org/show_bug.cgi?id=216663 We could add GFP_NOFAULT. But now I am thinking if using the existing GFP_NOFAIL is actually the right thing to do in these cases... (details are in the bug).