Hello all,
On 14.07.19 06:07, syzbot wrote:
syzbot has found a reproducer for the following crash on:
the internal users of the CAN networking subsystem like CAN_BCM and
CAN_RAW hold a number of CAN identifier subscriptions ('filters') for
CAN netdevices (only type ARPHRD_CAN) in their socket data structures.
The per-socket netdevice notifier is used to manage the ad-hoc removal
of these filters at netdevice removal time.
What I can see in the console output at
https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000
seems to be a race between an unknown register_netdevice_notifier() call
("A") and the unregister_netdevice_notifier() ("B") likely invoked by
bcm_release() ("C"):
[ 1047.294207][ T1049] schedule+0xa8/0x270
[ 1047.318401][ T1049] rwsem_down_write_slowpath+0x70a/0xf70
[ 1047.324114][ T1049] ? downgrade_write+0x3c0/0x3c0
[ 1047.438644][ T1049] ? mark_held_locks+0xf0/0xf0
[ 1047.443483][ T1049] ? lock_acquire+0x190/0x410
[ 1047.448191][ T1049] ? unregister_netdevice_notifier+0x7e/0x390
[ 1047.547227][ T1049] down_write+0x13c/0x150
[ 1047.579535][ T1049] ? down_write+0x13c/0x150
[ 1047.584106][ T1049] ? __down_timeout+0x2d0/0x2d0
[ 1047.635356][ T1049] ? mark_held_locks+0xf0/0xf0
[ 1047.640721][ T1049] unregister_netdevice_notifier+0x7e/0x390 <- "B"
[ 1047.646667][ T1049] ? __sock_release+0x89/0x280
[ 1047.709126][ T1049] ? register_netdevice_notifier+0x630/0x630 <- "A"
[ 1047.715203][ T1049] ? __kasan_check_write+0x14/0x20
[ 1047.775138][ T1049] bcm_release+0x93/0x5e0 <- "C"
[ 1047.795337][ T1049] __sock_release+0xce/0x280
[ 1047.829016][ T1049] sock_close+0x1e/0x30
The question to me is now:
Is the problem located in an (un)register_netdevice_notifier race OR is
it generally a bad idea to call unregister_netdevice_notifier() in a
sock release?
I've never seen that kind of problem in the wild. But if it would be the
latter case wouldn't it be the same problem when someone unloads the
kernel module at the 'wrong' time?
In commit 328fbe747ad46 ("net: Close race between {un,
}register_netdevice_notifier() and setup_net()/cleanup_net()") Kirill
Tkhai reviewed the calling site in CAN_RAW raw_release() which points to
the same situation. Therefore added him to the recipient list.
Should down_write() be replaced with something like
rwsem_down_write_slowpath()??
Regards,
Oliver
HEAD commit: a2d79c71 Merge tag 'for-5.3/io_uring-20190711' of
git://gi..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000
kernel config: https://syzkaller.appspot.com/x/.config?x=3539b1747f03988e
dashboard link:
https://syzkaller.appspot.com/bug?extid=0f1827363a305f74996f
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1765c52fa00000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f1827363a305f74996f@xxxxxxxxxxxxxxxxxxxxxxxxx
INFO: task syz-executor.4:9527 blocked for more than 143 seconds.
Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.4 D28136 9527 9356 0x00000004
Call Trace:
context_switch kernel/sched/core.c:3252 [inline]
__schedule+0x755/0x1580 kernel/sched/core.c:3878
schedule+0xa8/0x270 kernel/sched/core.c:3942
rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
__down_write kernel/locking/rwsem.c:1349 [inline]
down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
bcm_release+0x93/0x5e0 net/can/bcm.c:1525
__sock_release+0xce/0x280 net/socket.c:586
sock_close+0x1e/0x30 net/socket.c:1264
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:185 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3
48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c
24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.2:9528 blocked for more than 145 seconds.
Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.2 D28136 9528 9354 0x00000004
Call Trace:
context_switch kernel/sched/core.c:3252 [inline]
__schedule+0x755/0x1580 kernel/sched/core.c:3878
schedule+0xa8/0x270 kernel/sched/core.c:3942
rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
__down_write kernel/locking/rwsem.c:1349 [inline]
down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
bcm_release+0x93/0x5e0 net/can/bcm.c:1525
__sock_release+0xce/0x280 net/socket.c:586
sock_close+0x1e/0x30 net/socket.c:1264
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:185 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02
00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31
c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.0:9529 blocked for more than 147 seconds.
Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.0 D28136 9529 9353 0x00000004
Call Trace:
context_switch kernel/sched/core.c:3252 [inline]
__schedule+0x755/0x1580 kernel/sched/core.c:3878
schedule+0xa8/0x270 kernel/sched/core.c:3942
rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
__down_write kernel/locking/rwsem.c:1349 [inline]
down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
bcm_release+0x93/0x5e0 net/can/bcm.c:1525
__sock_release+0xce/0x280 net/socket.c:586
sock_close+0x1e/0x30 net/socket.c:1264
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:185 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3
48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c
24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.5:9533 blocked for more than 148 seconds.
Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.5 D28136 9533 9358 0x00000004
Call Trace:
context_switch kernel/sched/core.c:3252 [inline]
__schedule+0x755/0x1580 kernel/sched/core.c:3878
schedule+0xa8/0x270 kernel/sched/core.c:3942
rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
__down_write kernel/locking/rwsem.c:1349 [inline]
down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
bcm_release+0x93/0x5e0 net/can/bcm.c:1525
__sock_release+0xce/0x280 net/socket.c:586
sock_close+0x1e/0x30 net/socket.c:1264
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:185 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02
00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31
c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.1:9534 blocked for more than 148 seconds.
Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.1 D28136 9534 9359 0x00000004
Call Trace:
context_switch kernel/sched/core.c:3252 [inline]
__schedule+0x755/0x1580 kernel/sched/core.c:3878
schedule+0xa8/0x270 kernel/sched/core.c:3942
rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
__down_write kernel/locking/rwsem.c:1349 [inline]
down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
bcm_release+0x93/0x5e0 net/can/bcm.c:1525
__sock_release+0xce/0x280 net/socket.c:586
sock_close+0x1e/0x30 net/socket.c:1264
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:185 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3
48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c
24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.3:9535 blocked for more than 150 seconds.
Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.3 D28136 9535 9351 0x00000004
Call Trace:
context_switch kernel/sched/core.c:3252 [inline]
__schedule+0x755/0x1580 kernel/sched/core.c:3878
schedule+0xa8/0x270 kernel/sched/core.c:3942
rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
__down_write kernel/locking/rwsem.c:1349 [inline]
down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
bcm_release+0x93/0x5e0 net/can/bcm.c:1525
__sock_release+0xce/0x280 net/socket.c:586
sock_close+0x1e/0x30 net/socket.c:1264
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:185 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3
48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c
24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
Showing all locks held in the system:
1 lock held by khungtaskd/1049:
#0: 00000000ede263b0 (rcu_read_lock){....}, at:
debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5257
1 lock held by rsyslogd/9208:
#0: 00000000da20b59a (&f->f_pos_lock){+.+.}, at:
__fdget_pos+0xee/0x110 fs/file.c:801
2 locks held by getty/9298:
#0: 00000000e9efae0d (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000007287a12 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9299:
#0: 00000000ad0733b0 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000094dd5193 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9300:
#0: 00000000692c340f (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000538c7d7d (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9301:
#0: 00000000116ea6c7 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000a908a9f7 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9302:
#0: 0000000042704f01 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000041cc8671 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9303:
#0: 000000001ef3b293 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000008b703302 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9304:
#0: 0000000095601bb0 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341