Re: INFO: task hung in unregister_netdevice_notifier (3)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

On 14.07.19 06:07, syzbot wrote:
syzbot has found a reproducer for the following crash on:

the internal users of the CAN networking subsystem like CAN_BCM and CAN_RAW hold a number of CAN identifier subscriptions ('filters') for CAN netdevices (only type ARPHRD_CAN) in their socket data structures.

The per-socket netdevice notifier is used to manage the ad-hoc removal of these filters at netdevice removal time.

What I can see in the console output at

https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000

seems to be a race between an unknown register_netdevice_notifier() call ("A") and the unregister_netdevice_notifier() ("B") likely invoked by bcm_release() ("C"):

[ 1047.294207][ T1049]  schedule+0xa8/0x270
[ 1047.318401][ T1049]  rwsem_down_write_slowpath+0x70a/0xf70
[ 1047.324114][ T1049]  ? downgrade_write+0x3c0/0x3c0
[ 1047.438644][ T1049]  ? mark_held_locks+0xf0/0xf0
[ 1047.443483][ T1049]  ? lock_acquire+0x190/0x410
[ 1047.448191][ T1049]  ? unregister_netdevice_notifier+0x7e/0x390
[ 1047.547227][ T1049]  down_write+0x13c/0x150
[ 1047.579535][ T1049]  ? down_write+0x13c/0x150
[ 1047.584106][ T1049]  ? __down_timeout+0x2d0/0x2d0
[ 1047.635356][ T1049]  ? mark_held_locks+0xf0/0xf0
[ 1047.640721][ T1049]  unregister_netdevice_notifier+0x7e/0x390  <- "B"
[ 1047.646667][ T1049]  ? __sock_release+0x89/0x280
[ 1047.709126][ T1049]  ? register_netdevice_notifier+0x630/0x630 <- "A"
[ 1047.715203][ T1049]  ? __kasan_check_write+0x14/0x20
[ 1047.775138][ T1049]  bcm_release+0x93/0x5e0                    <- "C"
[ 1047.795337][ T1049]  __sock_release+0xce/0x280
[ 1047.829016][ T1049]  sock_close+0x1e/0x30

The question to me is now:

Is the problem located in an (un)register_netdevice_notifier race OR is it generally a bad idea to call unregister_netdevice_notifier() in a sock release?

I've never seen that kind of problem in the wild. But if it would be the latter case wouldn't it be the same problem when someone unloads the kernel module at the 'wrong' time?

In commit 328fbe747ad46 ("net: Close race between {un, }register_netdevice_notifier() and setup_net()/cleanup_net()") Kirill Tkhai reviewed the calling site in CAN_RAW raw_release() which points to the same situation. Therefore added him to the recipient list.

Should down_write() be replaced with something like rwsem_down_write_slowpath()??

Regards,
Oliver

HEAD commit:    a2d79c71 Merge tag 'for-5.3/io_uring-20190711' of git://gi..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3539b1747f03988e
dashboard link: https://syzkaller.appspot.com/bug?extid=0f1827363a305f74996f
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1765c52fa00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f1827363a305f74996f@xxxxxxxxxxxxxxxxxxxxxxxxx

INFO: task syz-executor.4:9527 blocked for more than 143 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.4  D28136  9527   9356 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.2:9528 blocked for more than 145 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.2  D28136  9528   9354 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.0:9529 blocked for more than 147 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.0  D28136  9529   9353 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.5:9533 blocked for more than 148 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.5  D28136  9533   9358 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.1:9534 blocked for more than 148 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.1  D28136  9534   9359 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.3:9535 blocked for more than 150 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.3  D28136  9535   9351 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff

Showing all locks held in the system:
1 lock held by khungtaskd/1049:
 #0: 00000000ede263b0 (rcu_read_lock){....}, at: debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5257
1 lock held by rsyslogd/9208:
 #0: 00000000da20b59a (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110 fs/file.c:801
2 locks held by getty/9298:
 #0: 00000000e9efae0d (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341  #1: 0000000007287a12 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9299:
 #0: 00000000ad0733b0 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341  #1: 0000000094dd5193 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9300:
 #0: 00000000692c340f (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341  #1: 00000000538c7d7d (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9301:
 #0: 00000000116ea6c7 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341  #1: 00000000a908a9f7 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9302:
 #0: 0000000042704f01 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341  #1: 0000000041cc8671 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9303:
 #0: 000000001ef3b293 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341  #1: 000000008b703302 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9304:
 #0: 0000000095601bb0 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341




[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux