On Wed, Feb 12, 2020 at 09:26:34AM +0200, Leon Romanovsky wrote: > From: Yonatan Cohen <yonatanc@xxxxxxxxxxxx> > > When unloading ib_umad, remove ibdev sys file 1st before > port removal to prevent kernel oops. > > ib_mad's method ibdev_show() might access a umad port > whoes ibdev field has already been NULLed when rmmod ib_umad > was issued from another shell. > > Consider this scenario > shell-1 shell-2 > rmmod ib_mod cat /sys/devices/../ibdev > | | > ib_umad_kill_port() ibdev_show() > port->ib_dev = NULL dev_name(port->ib_dev) > > kernel stack > PF: error_code(0x0000) - not-present page > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI > RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] > RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 > RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 > RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 > RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 > R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 > R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 > FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 > Call Trace: > dev_attr_show+0x15/0x50 > sysfs_kf_seq_show+0xb8/0x1a0 > seq_read+0x12d/0x350 > vfs_read+0x89/0x140 > ksys_read+0x55/0xd0 > do_syscall_64+0x55/0x1b0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9: > > Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()") This is the wrong fixes line, this ordering change was actually deliberately done: commit cf7ad3030271c55a7119a8c2162563e3f6e93879 Author: Parav Pandit <parav@xxxxxxxxxxxx> Date: Fri Dec 21 16:19:24 2018 +0200 IB/umad: Avoid destroying device while it is accessed ib_umad_reg_agent2() and ib_umad_reg_agent() access the device name in dev_notice(), while concurrently, ib_umad_kill_port() can destroy the device using device_destroy(). cpu-0 cpu-1 ----- ----- ib_umad_ioctl() [...] ib_umad_kill_port() device_destroy(dev) ib_umad_reg_agent() dev_notice(dev) The mistake in the above was to move the device_dstroy() down, not split it into device_del() above and put_device() below. Now that is already split we are OK. Jason