On Mon. 10 Mar 2025 at 03:59, Oliver Hartkopp <socketcan@xxxxxxxxxxxx> wrote: > Hello Marc, > > On 09.03.25 11:46, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 0f52fd4f67c6 Merge tag 'bcachefs-2025-03-06' of git://evil.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=12d12a54580000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=523b0e2f15224775 > > dashboard link: https://syzkaller.appspot.com/bug?extid=78ce4489b812515d5e4d > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > Downloadable assets: > > disk image: https://storage.googleapis.com/syzbot-assets/eb0d7b540c67/disk-0f52fd4f.raw.xz > > vmlinux: https://storage.googleapis.com/syzbot-assets/51c261332ad9/vmlinux-0f52fd4f.xz > > kernel image: https://storage.googleapis.com/syzbot-assets/38914a4790c8/bzImage-0f52fd4f.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+78ce4489b812515d5e4d@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > ================================================================== > > BUG: KCSAN: data-race in can_send / can_send > > > > read-write to 0xffff888117566290 of 8 bytes by interrupt on cpu 0: > > can_send+0x5a2/0x6d0 net/can/af_can.c:290 > > bcm_can_tx+0x314/0x420 net/can/bcm.c:314 > > bcm_tx_timeout_handler+0xea/0x280 > > __run_hrtimer kernel/time/hrtimer.c:1801 [inline] > > __hrtimer_run_queues+0x20d/0x5e0 kernel/time/hrtimer.c:1865 > > hrtimer_run_softirq+0xe4/0x2c0 kernel/time/hrtimer.c:1882 > > handle_softirqs+0xbf/0x280 kernel/softirq.c:561 > > run_ksoftirqd+0x1c/0x30 kernel/softirq.c:950 > > smpboot_thread_fn+0x31c/0x4c0 kernel/smpboot.c:164 > > kthread+0x4ae/0x520 kernel/kthread.c:464 > > ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:148 > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 > > > > read-write to 0xffff888117566290 of 8 bytes by interrupt on cpu 1: > > can_send+0x5a2/0x6d0 net/can/af_can.c:290 > > bcm_can_tx+0x314/0x420 net/can/bcm.c:314 > > bcm_tx_timeout_handler+0xea/0x280 > > __run_hrtimer kernel/time/hrtimer.c:1801 [inline] > > __hrtimer_run_queues+0x20d/0x5e0 kernel/time/hrtimer.c:1865 > > hrtimer_run_softirq+0xe4/0x2c0 kernel/time/hrtimer.c:1882 > > handle_softirqs+0xbf/0x280 kernel/softirq.c:561 > > run_ksoftirqd+0x1c/0x30 kernel/softirq.c:950 > > smpboot_thread_fn+0x31c/0x4c0 kernel/smpboot.c:164 > > kthread+0x4ae/0x520 kernel/kthread.c:464 > > ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:148 > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 > > > > value changed: 0x0000000000002b9d -> 0x0000000000002b9e > > > > Increased by '1' ... > > I assume this problem is caused by increasing the per-netdevice statistic in > > https://elixir.bootlin.com/linux/v6.13.6/source/net/can/af_can.c#L289 > > pkg_stats->tx_frames++; > pkg_stats->tx_frames_delta++; > > We update the statistics for the device and in this specific case the > hrtimer fired on two CPUs resulting in a can_send() to the same netdevice. > > Do you agree with this quick analysis? Ack. Same conclusion here. > Isn't there some lock-less per-cpu safe statistic handling within netdev > we might pick for our use-case? I see two solutions. Either we use lock_sock(skb->sk) and release_sock(skb->sk) or we can change the types of can_pkg_stats->tx_frames and can_pkg_stats->tx_frames_delta from long to atomic_long_t. The atomic_long_t is the closest solution to a lock-less. But my preference goes to the lock_sock() which looks more natural in this context. And look_sock() is just a spinlock which under the hood is also an atomic, so no big penalty either. Yours sincerely, Vincent Mailhol