With LKML Cced On 4 Jan 2023 22:47:16 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > syzbot is reporting hung task at freeze_super() after emitting > "gfs2: fsid=loop0.0: can't make FS RW: -5" message due to gfs2_find_jhead() > from gfs2_make_fs_rw() from gfs2_fill_super() failing with -EIO. > > When hung task message is printed, I can observe that glock_workqueue is > trying to hold type->s_umount_key from glock_workqueue work function > whereas the reproducer is trying to flush glock_workqueue workqueue with > type->s_umount_key held; leading to silent deadlock. The bonus of the task hung reported is that deadlock report could be missed easily, and change should be added to workqueue to catch it reliably. Hillf > > [ 259.867348] task:kworker/3:1H state:D stack:13736 pid:2497 ppid:2 flags:0x00004000 > [ 259.870666] Workqueue: glock_workqueue glock_work_func > [ 259.872886] Call Trace: > [ 259.874279] <TASK> > [ 259.875452] __schedule+0x498/0x590 > [ 259.877255] schedule+0x55/0x90 > [ 259.878932] rwsem_down_write_slowpath+0x385/0x760 > [ 259.881206] freeze_super+0x29/0x1f0 > [ 259.883022] freeze_go_sync+0xa6/0x1f0 > [ 259.884905] do_xmote+0x1ae/0xa60 > [ 259.886625] glock_work_func+0x19a/0x220 > [ 259.888559] process_one_work+0x306/0x530 > [ 259.890548] worker_thread+0x357/0x630 > [ 259.892443] kthread+0x133/0x170 > [ 259.894430] ? rcu_lock_release+0x30/0x30 > [ 259.896408] ? kthread_blkcg+0x60/0x60 > [ 259.898289] ret_from_fork+0x1f/0x30 > [ 259.900114] </TASK> > [ 259.901407] 3 locks held by kworker/3:1H/2497: > [ 259.903534] #0: ffff88810653c338 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x2a7/0x530 > [ 259.907874] #1: ffffc900013dbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530 > [ 259.912529] #2: ffff88812be0d0e0 (&type->s_umount_key#51){+.+.}-{3:3}, at: freeze_super+0x29/0x1f0 > > [ 259.975640] task:a.out state:D stack:12664 pid:4579 ppid:3638 flags:0x00004006 > [ 259.979256] Call Trace: > [ 259.980634] <TASK> > [ 259.981886] __schedule+0x498/0x590 > [ 259.983653] schedule+0x55/0x90 > [ 259.985291] schedule_timeout+0x9e/0x1d0 > [ 259.987219] do_wait_for_common+0xff/0x180 > [ 259.989212] ? console_conditional_schedule+0x40/0x40 > [ 259.991585] wait_for_completion+0x4a/0x60 > [ 259.993573] __flush_workqueue+0x360/0x820 > [ 259.995584] gfs2_gl_hash_clear+0x58/0x1b0 > [ 259.997579] ? _raw_spin_unlock_irqrestore+0x43/0xb0 > [ 259.999883] ? lockdep_hardirqs_on+0x99/0x140 > [ 260.001970] gfs2_fill_super+0xe24/0x1110 > [ 260.003934] ? gfs2_reconfigure+0x4d0/0x4d0 > [ 260.005955] get_tree_bdev+0x228/0x2f0 > [ 260.007821] gfs2_get_tree+0x2b/0xe0 > [ 260.009620] vfs_get_tree+0x30/0xe0 > [ 260.011570] do_new_mount+0x1d7/0x540 > [ 260.013408] path_mount+0x3c5/0xb50 > [ 260.015176] __se_sys_mount+0x298/0x2f0 > [ 260.017073] do_syscall_64+0x41/0x90 > [ 260.018870] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > [ 260.021262] RIP: 0033:0x7fc273f26eae > [ 260.023077] RSP: 002b:00007ffffebc6f38 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5 > [ 260.026419] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc273f26eae > [ 260.029573] RDX: 00000000200124c0 RSI: 0000000020012500 RDI: 00007ffffebc6fb0 > [ 260.032728] RBP: 00007ffffebc7100 R08: 00007ffffebc6ff0 R09: 0000000000000000 > [ 260.035903] R10: 0000000000000000 R11: 0000000000000282 R12: 00007ffffebc7278 > [ 260.039063] R13: 0000559922083246 R14: 0000559922097cb8 R15: 00007fc27413b040 > [ 260.042241] </TASK> > [ 260.043522] 1 lock held by a.out/4579: > [ 260.045379] #0: ffff88812be0d0e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0x102/0x450 > > It is difficult to reproduce lockdep warning because this is timing dependent. > So far I was able to reproduce lockdep warning only once. > > [ 111.928183] [ T4537] gfs2: fsid=loop0.0: can't make FS RW: -5 > [ 111.931578] [ T88] > [ 111.937591] [ T88] ====================================================== > [ 111.940620] [ T88] WARNING: possible circular locking dependency detected > [ 111.943553] [ T88] 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28 Not tainted > [ 111.946879] [ T88] ------------------------------------------------------ > [ 111.949832] [ T88] kworker/2:1H/88 is trying to acquire lock: > [ 111.952383] [ T88] ffff888128ff50e0 (&type->s_umount_key#51){+.+.}-{3:3}, at: freeze_super+0x29/0x1f0 > [ 111.956324] [ T88] > [ 111.956324] [ T88] but task is already holding lock: > [ 111.959406] [ T88] ffffc90000bfbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530 > [ 111.964081] [ T88] > [ 111.964081] [ T88] which lock already depends on the new lock. > [ 111.964081] [ T88] > [ 111.968219] [ T88] > [ 111.968219] [ T88] the existing dependency chain (in reverse order) is: > [ 111.971959] [ T88] > [ 111.971959] [ T88] -> #2 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}: > [ 111.976006] [ T88] process_one_work+0x2f3/0x530 > [ 111.978307] [ T88] worker_thread+0x357/0x630 > [ 111.980510] [ T88] kthread+0x133/0x170 > [ 111.982518] [ T88] ret_from_fork+0x1f/0x30 > [ 111.984621] [ T88] > [ 111.984621] [ T88] -> #1 ((wq_completion)glock_workqueue){+.+.}-{0:0}: > [ 111.987992] [ T88] __flush_workqueue+0x107/0x820 > [ 111.990137] [ T88] gfs2_gl_hash_clear+0x3d/0x1b0 > [ 111.992431] [ T88] gfs2_put_super+0x3a2/0x3f0 > [ 111.994399] [ T88] generic_shutdown_super+0x9a/0x1c0 > [ 111.996554] [ T88] kill_block_super+0x2d/0x70 > [ 111.999015] [ T88] deactivate_locked_super+0x5e/0xb0 > [ 112.001467] [ T88] cleanup_mnt+0x1c9/0x220 > [ 112.003591] [ T88] task_work_run+0xe4/0x130 > [ 112.005749] [ T88] exit_to_user_mode_loop+0x123/0x150 > [ 112.008224] [ T88] exit_to_user_mode_prepare+0xaa/0x130 > [ 112.010765] [ T88] syscall_exit_to_user_mode+0x32/0x70 > [ 112.013457] [ T88] do_syscall_64+0x4d/0x90 > [ 112.015811] [ T88] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > [ 112.018497] [ T88] > [ 112.018497] [ T88] -> #0 (&type->s_umount_key#51){+.+.}-{3:3}: > [ 112.021921] [ T88] __lock_acquire+0x1712/0x33b0 > [ 112.024223] [ T88] lock_acquire+0xd3/0x200 > [ 112.026342] [ T88] down_write+0x3b/0xd0 > [ 112.028384] [ T88] freeze_super+0x29/0x1f0 > [ 112.030503] [ T88] freeze_go_sync+0xa6/0x1f0 > [ 112.032651] [ T88] do_xmote+0x1ae/0xa60 > [ 112.034420] [ T88] glock_work_func+0x19a/0x220 > [ 112.036411] [ T88] process_one_work+0x306/0x530 > [ 112.038635] [ T88] worker_thread+0x357/0x630 > [ 112.040702] [ T88] kthread+0x133/0x170 > [ 112.042509] [ T88] ret_from_fork+0x1f/0x30 > [ 112.044388] [ T88] > [ 112.044388] [ T88] other info that might help us debug this: > [ 112.044388] [ T88] > [ 112.048102] [ T88] Chain exists of: > [ 112.048102] [ T88] &type->s_umount_key#51 --> (wq_completion)glock_workqueue --> (work_completion)(&(&gl->gl_work)->work) > [ 112.048102] [ T88] > [ 112.054316] [ T88] Possible unsafe locking scenario: > [ 112.054316] [ T88] > [ 112.057425] [ T88] CPU0 CPU1 > [ 112.059722] [ T88] ---- ---- > [ 112.062002] [ T88] lock((work_completion)(&(&gl->gl_work)->work)); > [ 112.064732] [ T88] lock((wq_completion)glock_workqueue); > [ 112.068131] [ T88] lock((work_completion)(&(&gl->gl_work)->work)); > [ 112.071852] [ T88] lock(&type->s_umount_key#51); > [ 112.074014] [ T88] > [ 112.074014] [ T88] *** DEADLOCK *** > [ 112.074014] [ T88] > [ 112.077358] [ T88] 2 locks held by kworker/2:1H/88: > [ 112.079541] [ T88] #0: ffff88812530f538 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x2a7/0x530 > [ 112.083975] [ T88] #1: ffffc90000bfbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530 > [ 112.088673] [ T88] > [ 112.088673] [ T88] stack backtrace: > [ 112.091159] [ T88] CPU: 2 PID: 88 Comm: kworker/2:1H Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28 > [ 112.095547] [ T88] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 > [ 112.099240] [ T88] Workqueue: glock_workqueue glock_work_func > [ 112.101745] [ T88] Call Trace: > [ 112.103257] [ T88] <TASK> > [ 112.104640] [ T88] dump_stack_lvl+0xfe/0x190 > [ 112.106676] [ T88] check_noncircular+0x12e/0x140 > [ 112.112220] [ T88] __lock_acquire+0x1712/0x33b0 > [ 112.114390] [ T88] ? __lock_acquire+0x65f/0x33b0 > [ 112.116857] [ T88] lock_acquire+0xd3/0x200 > [ 112.118861] [ T88] ? freeze_super+0x29/0x1f0 > [ 112.124130] [ T88] down_write+0x3b/0xd0 > [ 112.127778] [ T88] ? freeze_super+0x29/0x1f0 > [ 112.129819] [ T88] freeze_super+0x29/0x1f0 > [ 112.134526] [ T88] freeze_go_sync+0xa6/0x1f0 > [ 112.136585] [ T88] do_xmote+0x1ae/0xa60 > [ 112.138651] [ T88] glock_work_func+0x19a/0x220 > [ 112.140970] [ T88] process_one_work+0x306/0x530 > [ 112.143112] [ T88] worker_thread+0x357/0x630 > [ 112.145129] [ T88] kthread+0x133/0x170 > [ 112.146954] [ T88] ? rcu_lock_release+0x30/0x30 > [ 112.149534] [ T88] ? kthread_blkcg+0x60/0x60 > [ 112.151495] [ T88] ret_from_fork+0x1f/0x30 > [ 112.153850] [ T88] </TASK> > [ 256.043767] [ T59] INFO: task kworker/2:1H:88 blocked for more than 141 seconds. > [ 256.049591] [ T59] Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28 > [ 256.053325] [ T59] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 256.056783] [ T59] task:kworker/2:1H state:D stack:12792 pid:88 ppid:2 flags:0x00004000 > [ 256.060412] [ T59] Workqueue: glock_workqueue glock_work_func > [ 256.062783] [ T59] Call Trace: > [ 256.064008] [ T59] <TASK> > [ 256.065150] [ T59] __schedule+0x498/0x590 > [ 256.066916] [ T59] schedule+0x55/0x90 > [ 256.068647] [ T59] rwsem_down_write_slowpath+0x385/0x760 > [ 256.072030] [ T59] freeze_super+0x29/0x1f0 > [ 256.073874] [ T59] freeze_go_sync+0xa6/0x1f0 > [ 256.075760] [ T59] do_xmote+0x1ae/0xa60 > [ 256.077500] [ T59] glock_work_func+0x19a/0x220 > [ 256.079462] [ T59] process_one_work+0x306/0x530 > [ 256.081727] [ T59] worker_thread+0x357/0x630 > [ 256.083619] [ T59] kthread+0x133/0x170 > [ 256.085371] [ T59] ? rcu_lock_release+0x30/0x30 > [ 256.087391] [ T59] ? kthread_blkcg+0x60/0x60 > [ 256.089277] [ T59] ret_from_fork+0x1f/0x30 > [ 256.093784] [ T59] </TASK> > [ 256.095131] [ T59] INFO: task a.out:4537 blocked for more than 141 seconds. > [ 256.098069] [ T59] Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28 > [ 256.101464] [ T59] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 256.104916] [ T59] task:a.out state:D stack:12664 pid:4537 ppid:4531 flags:0x00004006 > [ 256.108532] [ T59] Call Trace: > [ 256.109916] [ T59] <TASK> > [ 256.111161] [ T59] __schedule+0x498/0x590 > [ 256.112959] [ T59] schedule+0x55/0x90 > [ 256.114589] [ T59] schedule_timeout+0x9e/0x1d0 > [ 256.116519] [ T59] do_wait_for_common+0xff/0x180 > [ 256.118515] [ T59] ? console_conditional_schedule+0x40/0x40 > [ 256.120860] [ T59] wait_for_completion+0x4a/0x60 > [ 256.122860] [ T59] __flush_workqueue+0x360/0x820 > [ 256.124855] [ T59] gfs2_gl_hash_clear+0x58/0x1b0 > [ 256.126868] [ T59] ? _raw_spin_unlock_irqrestore+0x43/0xb0 > [ 256.129180] [ T59] ? lockdep_hardirqs_on+0x99/0x140 > [ 256.131264] [ T59] gfs2_fill_super+0xe24/0x1110 > [ 256.133222] [ T59] ? gfs2_reconfigure+0x4d0/0x4d0 > [ 256.135239] [ T59] get_tree_bdev+0x228/0x2f0 > [ 256.137296] [ T59] gfs2_get_tree+0x2b/0xe0 > [ 256.139122] [ T59] vfs_get_tree+0x30/0xe0 > [ 256.140879] [ T59] do_new_mount+0x1d7/0x540 > [ 256.142707] [ T59] path_mount+0x3c5/0xb50 > [ 256.144461] [ T59] __se_sys_mount+0x298/0x2f0 > [ 256.146345] [ T59] do_syscall_64+0x41/0x90 > [ 256.148148] [ T59] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > [ 256.150484] [ T59] RIP: 0033:0x7faf87b26eae > [ 256.152285] [ T59] RSP: 002b:00007ffdb0d40f68 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5 > [ 256.155594] [ T59] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faf87b26eae > [ 256.160847] [ T59] RDX: 00000000200124c0 RSI: 0000000020012500 RDI: 00007ffdb0d40fe0 > [ 256.164184] [ T59] RBP: 00007ffdb0d41130 R08: 00007ffdb0d41020 R09: 0000000000000000 > [ 256.167404] [ T59] R10: 0000000000000000 R11: 0000000000000282 R12: 00007ffdb0d412a8 > [ 256.170913] [ T59] R13: 00005574b9254246 R14: 00005574b9268cb8 R15: 00007faf87da3040 > [ 256.174771] [ T59] </TASK> > [ 256.176029] [ T59] INFO: lockdep is turned off. > [ 256.178522] [ T59] NMI backtrace for cpu 7 > [ 256.180317] [ T59] CPU: 7 PID: 59 Comm: khungtaskd Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28 > [ 256.184614] [ T59] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 > [ 256.188220] [ T59] Call Trace: > [ 256.189626] [ T59] <TASK> > [ 256.190893] [ T59] dump_stack_lvl+0xfe/0x190 > [ 256.194767] [ T59] nmi_cpu_backtrace+0x25d/0x2c0 > [ 256.196804] [ T59] ? arch_trigger_cpumask_backtrace+0x10/0x10 > [ 256.199337] [ T59] nmi_trigger_cpumask_backtrace+0x11f/0x250 > [ 256.202150] [ T59] watchdog+0x8a5/0x8e0 > [ 256.203918] [ T59] kthread+0x133/0x170 > [ 256.205681] [ T59] ? hungtask_pm_notify+0x90/0x90 > [ 256.207783] [ T59] ? kthread_blkcg+0x60/0x60 > [ 256.209689] [ T59] ret_from_fork+0x1f/0x30 > [ 256.211690] [ T59] </TASK> > [ 256.213052] [ T59] Sending NMI from CPU 7 to CPUs 0-6: > [ 256.215232] [ C4] NMI backtrace for cpu 4 skipped: idling at default_idle+0x13/0x20 > [ 256.218391] [ C6] NMI backtrace for cpu 6 skipped: idling at default_idle+0x13/0x20 > [ 256.218378] [ C4] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 3.145 msecs > [ 256.218378] [ C2] NMI backtrace for cpu 2 skipped: idling at default_idle+0x13/0x20 > [ 256.218766] [ C0] NMI backtrace for cpu 0 skipped: idling at default_idle+0x13/0x20 > [ 256.218889] [ C1] NMI backtrace for cpu 1 skipped: idling at default_idle+0x13/0x20 > [ 256.219085] [ C5] NMI backtrace for cpu 5 skipped: idling at default_idle+0x13/0x20 > [ 256.219230] [ C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0x13/0x20 > > I suspect that cleanup is not done appropriately when gfs2_find_jhead() failed. > Looking into gfs2_make_fs_rw(), I see there are two worrisome things. > > One is that gfs2_make_fs_rw() returns an error without calling gfs2_consist(sdp) when > gfs2_find_jhead() returned an error whereas gfs2_make_fs_rw() returns -EIO after calling > gfs2_consist(sdp) when head.lh_flags does not contain GFS2_LOG_HEAD_UNMOUNT flag. > > Since head.lh_flags is assigned by gfs2_find_jhead(), we might want to call gfs2_consist(sdp) > when gfs2_find_jhead() returned an error. And actually > > --- a/fs/gfs2/super.c > +++ b/fs/gfs2/super.c > @@ -138,7 +138,11 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp) > return -EIO; > > error = gfs2_find_jhead(sdp->sd_jdesc, &head, false); > - if (error || gfs2_withdrawn(sdp)) > + if (error) { > + gfs2_consist(sdp); > + return error; > + } > + if (gfs2_withdrawn(sdp)) > return error; > > if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) { > > avoids this deadlock. But I don't know when/how to use gfs2_withdraw(). > Please check if this change is appropriate. > > The other worrisome thing is that gfs2_make_fs_rw() is returning 0 (and > mount operation will continue) when gfs2_withdrawn() == true. Can the caller > of gfs2_make_fs_rw() survive when callgfs2_make_fs_rw() returned 0 without > processing > > /* Initialize some head of the log stuff */ > sdp->sd_log_sequence = head.lh_sequence + 1; > gfs2_log_pointers_init(sdp, head.lh_blkno); > > lines? Shouldn't the caller of gfs2_make_fs_rw() observe an error when > gfs2_make_fs_rw() did not execute the > > set_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags); > > line due to gfs2_withdrawn() == true? >