On Sun, Feb 04, 2024 at 11:14:21 AM -1000, Tejun Heo wrote: > dd6c3c544126 ("workqueue: Move pwq_dec_nr_in_flight() to the end of work > item handling") relocated pwq_dec_nr_in_flight() after > set_work_pool_and_keep_pending(). However, the latter destroys information > contained in work->data that's needed by pwq_dec_nr_in_flight() including > the flush color. With flush color destroyed, flush_workqueue() can stall > easily when mixed with cancel_work*() usages. > > This is easily triggered by running xfstests generic/001 test on xfs: > > INFO: task umount:6305 blocked for more than 122 seconds. > ... > task:umount state:D stack:13008 pid:6305 tgid:6305 ppid:6301 flags:0x00004000 > Call Trace: > <TASK> > __schedule+0x2f6/0xa20 > schedule+0x36/0xb0 > schedule_timeout+0x20b/0x280 > wait_for_completion+0x8a/0x140 > __flush_workqueue+0x11a/0x3b0 > xfs_inodegc_flush+0x24/0xf0 > xfs_unmountfs+0x14/0x180 > xfs_fs_put_super+0x3d/0x90 > generic_shutdown_super+0x7c/0x160 > kill_block_super+0x1b/0x40 > xfs_kill_sb+0x12/0x30 > deactivate_locked_super+0x35/0x90 > deactivate_super+0x42/0x50 > cleanup_mnt+0x109/0x170 > __cleanup_mnt+0x12/0x20 > task_work_run+0x60/0x90 > syscall_exit_to_user_mode+0x146/0x150 > do_syscall_64+0x5d/0x110 > entry_SYSCALL_64_after_hwframe+0x6c/0x74 > > Fix it by stashing work_data before calling set_work_pool_and_keep_pending() > and using the stashed value for pwq_dec_nr_in_flight(). > > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> > Reported-by: Chandan Babu R <chandanbabu@xxxxxxxxxx> > Link: http://lkml.kernel.org/r/87o7cxeehy.fsf@debian-BULLSEYE-live-builder-AMD64 > Fixes: dd6c3c544126 ("workqueue: Move pwq_dec_nr_in_flight() to the end of work item handling") > --- > Hello, Chandan. > > Thanks a lot for the report. I could reproduce the problem and verified that > this patch fixes the issue. I'm applying this to wq/for-6.9 but would really > appreciate if you could confirm the fix. fstests executed without any regressions on a next-20240202 kernel with this patch applied. -- Chandan