On Tue, 2023-01-10 at 11:58 -0800, dai.ngo@xxxxxxxxxx wrote: > > On 1/10/23 11:30 AM, Jeff Layton wrote: > > > > > > > > > Looking over the traces that Mike posted, I suspect this is the real > > bug, particularly if the server is being restarted during this test. > > Yes, I noticed the WARN_ON_ONCE(timer->function != delayed_work_timer_fn) > too and this seems to indicate some kind of corruption. However, I'm not > sure if Mike's test restarts the nfs-server service. This could be a bug > in work queue module when it's under stress. My reproducer was to merely mount and traverse/md5sum, while that was going on, fire up LTP's min_free_kbytes testcase (memory hog from hell) on the server. Systemthing may well be restarting the server service in response to oomkill. In fact, the struct delayed_work in question at WARN_ON_ONCE() time didn't look the least bit ready for business. FWIW, I had noticed the missing cancel while eyeballing, and stuck one next to the existing one as a hail-mary, but that helped not at all. crash> delayed_work ffff8881601fab48 struct delayed_work { work = { data = { counter = 1 }, entry = { next = 0x0, prev = 0x0 }, func = 0x0 }, timer = { entry = { next = 0x0, pprev = 0x0 }, expires = 0, function = 0x0, flags = 0 }, wq = 0x0, cpu = 0 }