On Wed, 2023-01-11 at 05:15 -0500, Jeff Layton wrote: > On Wed, 2023-01-11 at 03:34 +0100, Mike Galbraith wrote: > > On Tue, 2023-01-10 at 11:58 -0800, dai.ngo@xxxxxxxxxx wrote: > > > > > > On 1/10/23 11:30 AM, Jeff Layton wrote: > > > > > > > > > > > > > > > > > Looking over the traces that Mike posted, I suspect this is the real > > > > bug, particularly if the server is being restarted during this test. > > > > > > Yes, I noticed the WARN_ON_ONCE(timer->function != delayed_work_timer_fn) > > > too and this seems to indicate some kind of corruption. However, I'm not > > > sure if Mike's test restarts the nfs-server service. This could be a bug > > > in work queue module when it's under stress. > > > > My reproducer was to merely mount and traverse/md5sum, while that was > > going on, fire up LTP's min_free_kbytes testcase (memory hog from hell) > > on the server. Systemthing may well be restarting the server service > > in response to oomkill. In fact, the struct delayed_work in question > > at WARN_ON_ONCE() time didn't look the least bit ready for business. > > > > FWIW, I had noticed the missing cancel while eyeballing, and stuck one > > next to the existing one as a hail-mary, but that helped not at all. > > > > Ok, thanks, that's good to know. > > I still doubt that the problem is the race that Dai seems to think it > is. The workqueue infrastructure has been fairly stable for years. If > there were problems with concurrent tasks queueing the same work, the > kernel would be blowing up all over the place. > > > crash> delayed_work ffff8881601fab48 > > struct delayed_work { > > work = { > > data = { > > counter = 1 > > }, > > entry = { > > next = 0x0, > > prev = 0x0 > > }, > > func = 0x0 > > }, > > timer = { > > entry = { > > next = 0x0, > > pprev = 0x0 > > }, > > expires = 0, > > function = 0x0, > > flags = 0 > > }, > > wq = 0x0, > > cpu = 0 > > } > > That looks more like a memory scribble or UAF. Merely having multiple > tasks calling queue_work at the same time wouldn't be enough to trigger > this, IMO. It's more likely that the extra locking is changing the > timing of your reproducer somehow. > > It might be interesting to turn up KASAN if you're able. If you still have this vmcore, it might be interesting to do the pointer math and find the nfsd_net structure that contains the above delayed_work. Does the rest of it also seem to be corrupt? My guess is that the corrupted structure extends beyond just the delayed_work above. Also, it might be helpful to do this: kmem -s ffff8881601fab48 ...which should tell us whether and what part of the slab this object is now a part of. That said, net-namespace object allocations are somewhat weird, and I'm not 100% sure they come out of the slab. -- Jeff Layton <jlayton@xxxxxxxxxx>