On Wed, Jul 18, 2018 at 4:17 PM, Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > On 2018/07/18 23:11, Dmitry Vyukov wrote: >> On Wed, Jul 18, 2018 at 3:35 PM, Tetsuo Handa >> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: >>>>>> This seems to be related to 9p. After rerunning the log I got: >>>>>> >>>>>> root@syzkaller:~# ps afxu | grep syz >>>>>> root 18253 0.0 0.0 0 0 ttyS0 Zl 10:16 0:00 \_ >>>>>> [syz-executor] <defunct> >>>>>> root@syzkaller:~# cat /proc/18253/task/*/stack >>>>>> [<0>] p9_client_rpc+0x3a2/0x1400 >>>>>> [<0>] p9_client_flush+0x134/0x2a0 >>>>>> [<0>] p9_client_rpc+0x122c/0x1400 >>>>>> [<0>] p9_client_create+0xc56/0x16af >>>>>> [<0>] v9fs_session_init+0x21a/0x1a80 >>>>>> [<0>] v9fs_mount+0x7c/0x900 >>>>>> [<0>] mount_fs+0xae/0x328 >>>>>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0 >>>>>> [<0>] do_mount+0x581/0x30e0 >>>>>> [<0>] ksys_mount+0x12d/0x140 >>>>>> [<0>] __x64_sys_mount+0xbe/0x150 >>>>>> [<0>] do_syscall_64+0x1b9/0x820 >>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>>> [<0>] 0xffffffffffffffff >>>>>> >>>>>> There is a bunch of hangs in 9p, so let's do: >>>>>> >>>>>> #syz dup: INFO: task hung in flush_work >>>>>> >>>>> Then, is dumping all threads when khungtaskd fires a candidate >>>>> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path? >>>> >>>> Perhaps would be useful. But maybe only tasks that are blocked for >>>> more than timeout/2? and/or unkillable tasks? killable tasks are not a >>>> problem. >>> >>> TASK_KILLABLE waiters are not reported by khungtaskd, are they? >>> >>> /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ >>> if (t->state == TASK_UNINTERRUPTIBLE) >>> check_hung_task(t, timeout); >>> >>> And TASK_KILLABLE waiters can become a problem because >>> >>>> >>>> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it >>>> killed along with the whole process? >>>> >>> >>> wait_event_killable() would return -ERESTARTSYS if got SIGKILL. >>> But if (c->status == Connected) && (type == P9_TFLUSH) is also true, >>> it ignores SIGKILL by retrying the loop... >>> >>> again: >>> err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD); >>> if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) { >>> sigpending = 1; >>> clear_thread_flag(TIF_SIGPENDING); >>> goto again; >>> } >>> >>> I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread). >> >> >> I guess that's the problem, right? SIGKILL-ed task must not ignore >> SIGKILL and hang in infinite loop. This would explain a bunch of hangs >> in 9p. > > Did you check /proc/18253/task/*/stack after manually sending SIGKILL? Yes: root@syzkaller:~# ps afxu | grep syz root 18253 0.0 0.0 0 0 ttyS0 Zl 10:16 0:00 \_ [syz-executor] <defunct> root@syzkaller:~# cat /proc/18253/task/*/stack [<0>] p9_client_rpc+0x3a2/0x1400 [<0>] p9_client_flush+0x134/0x2a0 [<0>] p9_client_rpc+0x122c/0x1400 [<0>] p9_client_create+0xc56/0x16af [<0>] v9fs_session_init+0x21a/0x1a80 [<0>] v9fs_mount+0x7c/0x900 [<0>] mount_fs+0xae/0x328 [<0>] vfs_kern_mount.part.34+0xdc/0x4e0 [<0>] do_mount+0x581/0x30e0 [<0>] ksys_mount+0x12d/0x140 [<0>] __x64_sys_mount+0xbe/0x150 [<0>] do_syscall_64+0x1b9/0x820 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<0>] 0xffffffffffffffff > I mean, who (i.e. you or syzkaller programs) is sending a signal (not limited > to SIGKILL but any signal) that makes TASK_KILLABLE waiters to wake up? Both. syzkaller always SIGKILLs test process after some timeout and expects it to go away. I also tried manually after that, but it does not make any difference.