On Wed, 2023-01-11 at 12:19 +0100, Mike Galbraith wrote: > On Wed, 2023-01-11 at 05:55 -0500, Jeff Layton wrote: > > > > > > > crash> delayed_work ffff8881601fab48 > > > > struct delayed_work { > > > > work = { > > > > data = { > > > > counter = 1 > > > > }, > > > > entry = { > > > > next = 0x0, > > > > prev = 0x0 > > > > }, > > > > func = 0x0 > > > > }, > > > > timer = { > > > > entry = { > > > > next = 0x0, > > > > pprev = 0x0 > > > > }, > > > > expires = 0, > > > > function = 0x0, > > > > flags = 0 > > > > }, > > > > wq = 0x0, > > > > cpu = 0 > > > > } > > > > > > That looks more like a memory scribble or UAF. Merely having multiple > > > tasks calling queue_work at the same time wouldn't be enough to trigger > > > this, IMO. It's more likely that the extra locking is changing the > > > timing of your reproducer somehow. > > > > > > It might be interesting to turn up KASAN if you're able. > > I can try that. > > > If you still have this vmcore, it might be interesting to do the pointer > > math and find the nfsd_net structure that contains the above > > delayed_work. Does the rest of it also seem to be corrupt? My guess is > > that the corrupted structure extends beyond just the delayed_work above. > > > > Also, it might be helpful to do this: > > > > kmem -s ffff8881601fab48 > > > > ...which should tell us whether and what part of the slab this object is > > now a part of. That said, net-namespace object allocations are somewhat > > weird, and I'm not 100% sure they come out of the slab. > > I tossed the vmcore, but can generate another. I had done kmem sans -s > previously, still have that. > > crash> kmem ffff8881601fab48 > CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME > kmem: kmalloc-1k: partial list slab: ffffea0005b20c08 invalid page.inuse: -1 > ffff888100041840 1024 2329 2432 76 32k kmalloc-1k > SLAB MEMORY NODE TOTAL ALLOCATED FREE > ffffea0005807e00 ffff8881601f8000 0 32 32 0 > FREE / [ALLOCATED] > [ffff8881601fa800] > > PAGE PHYSICAL MAPPING INDEX CNT FLAGS > ffffea0005807e80 1601fa000 dead000000000400 0 0 200000000000000 > crash > Thanks. The pernet allocations do come out of the slab. The allocation is done in ops_init in net/core/namespace.c. This one is a 1k allocation, which jives with the size of nfsd_net (which is 976 bytes). So, this seems to be consistent with where an nfsd_net would have come from. Maybe not a UAF, but I do think we have some sort of mem corruption going on. -- Jeff Layton <jlayton@xxxxxxxxxx>