Lockdep splat in RXE (softRoCE) driver in xarray accesses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Zhu, Bob, Steve,

There seems to be a locking bug in the softRoCE driver when mounting a cifs
share.  See attached trace.  I'm guessing the problem is that a softirq
handler is accessing the xarray, but other accesses to the xarray aren't
guarded by _bh or _irq markers on the lock primitives.

I wonder if rxe_pool_get_index() should just rely on the RCU read lock and not
take the spinlock.

Alternatively, __rxe_add_to_pool() should be using xa_alloc_cyclic_bh() or
xa_alloc_cyclic_irq().

I used the following commands:

   rdma link add rxe0 type rxe netdev enp6s0 # andromeda, softRoCE
   mount //192.168.6.1/scratch /xfstest.scratch -o user=shares,rdma,pass=...

talking to ksmbd on the other side.

Kernel is v5.18-rc6.

David
---
infiniband rxe0: set active
infiniband rxe0: added enp6s0
RDS/IB: rxe0: added
CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount.
CIFS: Attempting to mount \\192.168.6.1\scratch

================================
WARNING: inconsistent lock state
5.18.0-rc6-build2+ #465 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/1/20 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888134d11310 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x19/0x69
{SOFTIRQ-ON-W} state was registered at:
  mark_usage+0x169/0x17b
  __lock_acquire+0x50c/0x96a
  lock_acquire+0x2f4/0x37b
  _raw_spin_lock+0x2f/0x39
  xa_alloc_cyclic.constprop.0+0x20/0x55
  __rxe_add_to_pool+0xe3/0xf2
  __ib_alloc_pd+0xa2/0x26b
  ib_mad_port_open+0x1ac/0x4a1
  ib_mad_init_device+0x9b/0x1b9
  add_client_context+0x133/0x1b3
  enable_device_and_get+0x129/0x248
  ib_register_device+0x256/0x2fd
  rxe_register_device+0x18e/0x1b7
  rxe_net_add+0x57/0x71
  rxe_newlink+0x71/0x8e
  nldev_newlink+0x200/0x26a
  rdma_nl_rcv_msg+0x260/0x2ab
  rdma_nl_rcv+0x108/0x1a7
  netlink_unicast+0x1fc/0x2b3
  netlink_sendmsg+0x4ce/0x51b
  sock_sendmsg_nosec+0x41/0x4f
  __sys_sendto+0x157/0x1cc
  __x64_sys_sendto+0x76/0x82
  do_syscall_64+0x39/0x46
  entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 194111
hardirqs last  enabled at (194110): [<ffffffff81094eb2>] __local_bh_enable_ip+0xb8/0xcc
hardirqs last disabled at (194111): [<ffffffff82040077>] _raw_spin_lock_irqsave+0x1b/0x51
softirqs last  enabled at (194100): [<ffffffff8240043a>] __do_softirq+0x43a/0x489
softirqs last disabled at (194105): [<ffffffff81094d30>] run_ksoftirqd+0x31/0x56

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xa->xa_lock#12);
  <Interrupt>
    lock(&xa->xa_lock#12);

 *** DEADLOCK ***

no locks held by ksoftirqd/1/20.

stack backtrace:
CPU: 1 PID: 20 Comm: ksoftirqd/1 Not tainted 5.18.0-rc6-build2+ #465
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x45/0x59
 valid_state+0x56/0x61
 mark_lock_irq+0x9b/0x2ec
 ? ret_from_fork+0x1f/0x30
 ? valid_state+0x61/0x61
 ? stack_trace_save+0x8f/0xbe
 ? filter_irq_stacks+0x58/0x58
 ? jhash.constprop.0+0x1ad/0x202
 ? save_trace+0x17c/0x196
 mark_lock.part.0+0x10c/0x164
 mark_usage+0xe6/0x17b
 __lock_acquire+0x50c/0x96a
 lock_acquire+0x2f4/0x37b
 ? rxe_pool_get_index+0x19/0x69
 ? rcu_read_unlock+0x52/0x52
 ? jhash.constprop.0+0x1ad/0x202
 ? lockdep_unlock+0xde/0xe6
 ? validate_chain+0x44a/0x4a8
 ? req_next_wqe+0x312/0x363
 _raw_spin_lock_irqsave+0x41/0x51
 ? rxe_pool_get_index+0x19/0x69
 rxe_pool_get_index+0x19/0x69
 rxe_get_av+0xbe/0x14b
 rxe_requester+0x6b5/0xbb0
 ? rnr_nak_timer+0x16/0x16
 ? lock_downgrade+0xad/0xad
 ? rcu_read_lock_bh_held+0xab/0xab
 ? __wake_up+0xf/0xf
 ? mark_held_locks+0x1f/0x78
 ? __local_bh_enable_ip+0xb8/0xcc
 ? rnr_nak_timer+0x16/0x16
 rxe_do_task+0xb5/0x13d
 ? rxe_detach_mcast+0x1d6/0x1d6
 tasklet_action_common.constprop.0+0xda/0x145
 __do_softirq+0x202/0x489
 ? __irq_exit_rcu+0x108/0x108
 ? _local_bh_enable+0x1c/0x1c
 run_ksoftirqd+0x31/0x56
 smpboot_thread_fn+0x35c/0x376
 ? sort_range+0x1c/0x1c
 kthread+0x164/0x173
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
CIFS: VFS: RDMA transport established





[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux