Dmitry Vyukov wrote on Fri, Nov 02, 2018: > >> I guess that's the problem, right? SIGKILL-ed task must not ignore > >> SIGKILL and hang in infinite loop. This would explain a bunch of hangs > >> in 9p. > > > > Did you check /proc/18253/task/*/stack after manually sending SIGKILL? > > Yes: > > root@syzkaller:~# ps afxu | grep syz > root 18253 0.0 0.0 0 0 ttyS0 Zl 10:16 0:00 \_ > [syz-executor] <defunct> > root@syzkaller:~# cat /proc/18253/task/*/stack > [<0>] p9_client_rpc+0x3a2/0x1400 > [<0>] p9_client_flush+0x134/0x2a0 > [<0>] p9_client_rpc+0x122c/0x1400 > [<0>] p9_client_create+0xc56/0x16af > [<0>] v9fs_session_init+0x21a/0x1a80 > [<0>] v9fs_mount+0x7c/0x900 > [<0>] mount_fs+0xae/0x328 > [<0>] vfs_kern_mount.part.34+0xdc/0x4e0 > [<0>] do_mount+0x581/0x30e0 > [<0>] ksys_mount+0x12d/0x140 > [<0>] __x64_sys_mount+0xbe/0x150 > [<0>] do_syscall_64+0x1b9/0x820 > [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [<0>] 0xffffffffffffffff Yes that's a known problem with the current code, since everything must be cleaned up on the spot, the first kill sends a flush and waits again for the flush reply to come; the second kill is completly ignored. With the refcounting work we've done that went in this merge window we're halfways there - memory can now have a lifetime independant of the current request and won't be freed when the process exits p9_client_rpc, so we can send the flush and return immediately; then have the rest of the cleanup happen asynchronously when the flush reply comes or the client is torn down, whichever happens first. I've got this planned for 4.21 if I can find the time to do it early in this cycle and I get it to work on first try, 4.22 if I run into complications to make sure it's well tested in -next first. My freetime is pretty limited this year so unless you want to help it'll get done when it's ready :) -- Dominique