On Tue, Jun 19, 2018 at 1:44 PM, Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > This bug report is getting no feedback, but I guess that this bug is in > block or mm or locking layer rather than fs layer. > > NMI backtrace for this bug tends to report that sb_bread() from fill_super() > from mount_bdev() is stalling is the cause of keep holding s_umount_key for > more than 120 seconds. What is strange is that NMI backtrace for this bug tends > to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/ > rcu_read_unlock() which is expected not to stall. > > Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to > CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT, > maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can > give us some hints... If an rcu stall is the true root cause of this, then I guess would see "rcu stall" bug too. Rcu stall is detected after 120 seconds, but task hang after 120-240 seconds. So rcu stall has much higher chances to be detected. Do you see the corresponding "rcu stall" bug? But, yes, we need to tune all timeouts. There is https://github.com/google/syzkaller/issues/516 for this. We also need "kernel/hung_task.c: allow to set checking interval separately from timeout" to be merged: https://groups.google.com/forum/#!topic/syzkaller/rOr3WBE-POY as currently it's very hard to tune task hung timeout. But maybe we will need similar patches for other watchdogs too if they have the same problem.