Re: INFO: task hung in io_wq_destroy

Eric Biggers <ebiggers@xxxxxxxxxx> · Mon, 18 Nov 2019 18:23:30 -0800

Hi Jens,

On Mon, Oct 28, 2019 at 03:00:08PM -0600, Jens Axboe wrote:
> This is fixed in my for-next branch for a few days at least, unfortunately
> linux-next is still on the old one. Next version should be better.

This is still occurring on linux-next.  Here's a report on next-20191115 from
https://syzkaller.appspot.com/text?tag=CrashReport&x=16fa3d1ce00000

INFO: task syz-executor.5:11684 can't die for more than 143 seconds.
syz-executor.5  D28160 11684   9114 0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3385 [inline]
 __schedule+0x8e1/0x1f30 kernel/sched/core.c:4081
 schedule+0xdc/0x2b0 kernel/sched/core.c:4155
 schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
 io_wq_destroy+0x22a/0x450 fs/io-wq.c:1068
 io_finish_async+0x128/0x1b0 fs/io_uring.c:3423
 io_ring_ctx_free fs/io_uring.c:4220 [inline]
 io_ring_ctx_wait_and_kill+0x2b7/0x810 fs/io_uring.c:4293
 io_uring_release+0x42/0x50 fs/io_uring.c:4301
 __fput+0x2ff/0x890 fs/file_table.c:280
 ____fput+0x16/0x20 fs/file_table.c:313
 task_work_run+0x145/0x1c0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:164
 prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:278 [inline]
 do_syscall_64+0x676/0x790 arch/x86/entry/common.c:304
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4141d1
Code: 65 00 31 d2 be 53 25 44 00 bf 09 3d 44 00 ff 15 95 57 24 00 b9 c0 99 65 00 31 d2 be 44 25 44 00 bf 09 3d 44 00 ff 15 7e 57 24 <00> b9 20 9b 65 00 31 d2 be ac 94 44 00 bf 09 3d 44 00 ff 15 67 57
RSP: 002b:00007ffc4b532100 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00000000004141d1
RDX: 0000001b2d520000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000034194edd R09: 0000000034194ee1
R10: 00007ffc4b5321e0 R11: 0000000000000293 R12: 000000000075bf20
R13: 00000000000428a4 R14: 00000000007604b0 R15: 000000000075bf2c
INFO: task syz-executor.5:11684 blocked for more than 143 seconds.
      Not tainted 5.4.0-rc7-next-20191115 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.5  D28160 11684   9114 0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3385 [inline]
 __schedule+0x8e1/0x1f30 kernel/sched/core.c:4081
 schedule+0xdc/0x2b0 kernel/sched/core.c:4155
 schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
 io_wq_destroy+0x22a/0x450 fs/io-wq.c:1068
 io_finish_async+0x128/0x1b0 fs/io_uring.c:3423
 io_ring_ctx_free fs/io_uring.c:4220 [inline]
 io_ring_ctx_wait_and_kill+0x2b7/0x810 fs/io_uring.c:4293
 io_uring_release+0x42/0x50 fs/io_uring.c:4301
 __fput+0x2ff/0x890 fs/file_table.c:280
 ____fput+0x16/0x20 fs/file_table.c:313
 task_work_run+0x145/0x1c0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:164
 prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:278 [inline]
 do_syscall_64+0x676/0x790 arch/x86/entry/common.c:304
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4141d1
Code: 65 00 31 d2 be 53 25 44 00 bf 09 3d 44 00 ff 15 95 57 24 00 b9 c0 99 65 00 31 d2 be 44 25 44 00 bf 09 3d 44 00 ff 15 7e 57 24 <00> b9 20 9b 65 00 31 d2 be ac 94 44 00 bf 09 3d 44 00 ff 15 67 57
RSP: 002b:00007ffc4b532100 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00000000004141d1
RDX: 0000001b2d520000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000034194edd R09: 0000000034194ee1
R10: 00007ffc4b5321e0 R11: 0000000000000293 R12: 000000000075bf20
R13: 00000000000428a4 R14: 00000000007604b0 R15: 000000000075bf2c

Showing all locks held in the system:
1 lock held by khungtaskd/834:
 #0: ffffffff88faccc0 (rcu_read_lock){....}, at: debug_show_all_locks+0x5f/0x279 kernel/locking/lockdep.c:5334
3 locks held by kworker/u4:4/835:
 #0: ffff8880ae837358 (&rq->lock){-.-.}, at: newidle_balance+0xa28/0xe80 kernel/sched/fair.c:10049
 #1: ffffffff88faccc0 (rcu_read_lock){....}, at: __update_idle_core+0x45/0x3f0 kernel/sched/fair.c:5697
 #2: ffffffff88faccc0 (rcu_read_lock){....}, at: batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:405 [inline]
 #2: ffffffff88faccc0 (rcu_read_lock){....}, at: batadv_nc_worker+0xe3/0x760 net/batman-adv/network-coding.c:718
1 lock held by rsyslogd/8970:
 #0: ffff8880a9919b60 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110 fs/file.c:801
2 locks held by getty/9060:
 #0: ffff8880917eb090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9061:
 #0: ffff8880a4ab7090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9062:
 #0: ffff888096614090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9063:
 #0: ffff8880a776f090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f152e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9064:
 #0: ffff8880a5f2b090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f312e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9065:
 #0: ffff88809145c090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/9066:
 #0: ffff8880a05c6090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
 #1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 834 Comm: khungtaskd Not tainted 5.4.0-rc7-next-20191115 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline]
 watchdog+0xc8f/0x1350 kernel/hung_task.c:353
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60