On Wed, 2023-01-11 at 05:55 -0500, Jeff Layton wrote: > > > > > crash> delayed_work ffff8881601fab48 > > > struct delayed_work { > > > work = { > > > data = { > > > counter = 1 > > > }, > > > entry = { > > > next = 0x0, > > > prev = 0x0 > > > }, > > > func = 0x0 > > > }, > > > timer = { > > > entry = { > > > next = 0x0, > > > pprev = 0x0 > > > }, > > > expires = 0, > > > function = 0x0, > > > flags = 0 > > > }, > > > wq = 0x0, > > > cpu = 0 > > > } > > > > That looks more like a memory scribble or UAF. Merely having multiple > > tasks calling queue_work at the same time wouldn't be enough to trigger > > this, IMO. It's more likely that the extra locking is changing the > > timing of your reproducer somehow. > > > > It might be interesting to turn up KASAN if you're able. I can try that. > If you still have this vmcore, it might be interesting to do the pointer > math and find the nfsd_net structure that contains the above > delayed_work. Does the rest of it also seem to be corrupt? My guess is > that the corrupted structure extends beyond just the delayed_work above. > > Also, it might be helpful to do this: > > kmem -s ffff8881601fab48 > > ...which should tell us whether and what part of the slab this object is > now a part of. That said, net-namespace object allocations are somewhat > weird, and I'm not 100% sure they come out of the slab. I tossed the vmcore, but can generate another. I had done kmem sans -s previously, still have that. crash> kmem ffff8881601fab48 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME kmem: kmalloc-1k: partial list slab: ffffea0005b20c08 invalid page.inuse: -1 ffff888100041840 1024 2329 2432 76 32k kmalloc-1k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea0005807e00 ffff8881601f8000 0 32 32 0 FREE / [ALLOCATED] [ffff8881601fa800] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0005807e80 1601fa000 dead000000000400 0 0 200000000000000 crash