Re: rdma-for-next, rdma_rxe: inconsistent lock state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2022/6/1 4:55, Pearson, Robert B 写道:


-----Original Message-----
From: Bart Van Assche <bvanassche@xxxxxxx>
Sent: Tuesday, May 31, 2022 3:47 PM
To: Bob Pearson <rpearsonhpe@xxxxxxxxx>
Cc: linux-rdma@xxxxxxxxxxxxxxx
Subject: rdma-for-next, rdma_rxe: inconsistent lock state

Hi Bob,

With the rdma-for-next branch (commit 9c477178a0a1 ("RDMA/rtrs-clt: Fix one kernel-doc comment")) I see the following:

================================
WARNING: inconsistent lock state
5.18.0-dbg #4 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888116f0d350 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x73/0x170 [rdma_rxe] {SOFTIRQ-ON-W} state was registered at:
    __lock_acquire+0x45b/0xce0
    lock_acquire+0x18a/0x450
    _raw_spin_lock+0x34/0x50
    __rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
    rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
    __ib_alloc_pd+0xa3/0x270 [ib_core]
    ib_mad_port_open+0x44a/0x790 [ib_core]
    ib_mad_init_device+0x8e/0x110 [ib_core]
    add_client_context+0x26a/0x330 [ib_core]
    enable_device_and_get+0x169/0x2b0 [ib_core]
    ib_register_device+0x26f/0x330 [ib_core]
    rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
    rxe_add+0x8c/0xc0 [rdma_rxe]
    rxe_net_add+0x5b/0x90 [rdma_rxe]
    rxe_newlink+0x71/0x80 [rdma_rxe]
    nldev_newlink+0x21e/0x370 [ib_core]
    rdma_nl_rcv_msg+0x200/0x410 [ib_core]
    rdma_nl_rcv+0x140/0x220 [ib_core]
    netlink_unicast+0x307/0x460
    netlink_sendmsg+0x422/0x750
    __sys_sendto+0x1c2/0x250
    __x64_sys_sendto+0x7f/0x90
    do_syscall_64+0x35/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 71543
hardirqs last  enabled at (71542): [<ffffffff810cdc28>] __local_bh_enable_ip+0x88/0xf0 hardirqs last disabled at (71543): [<ffffffff81e9d67d>] _raw_spin_lock_irqsave+0x5d/0x60 softirqs last  enabled at (71532): [<ffffffff82200467>] __do_softirq+0x467/0x6e1 softirqs last disabled at (71537): [<ffffffff810cda47>] run_ksoftirqd+0x37/0x60

other info that might help us debug this:
   Possible unsafe locking scenario:
         CPU0
         ----
    lock(&xa->xa_lock#12);
    <Interrupt>
      lock(&xa->xa_lock#12);

   *** DEADLOCK ***
no locks held by ksoftirqd/2/25.

stack backtrace:
CPU: 2 PID: 25 Comm: ksoftirqd/2 Not tainted 5.18.0-dbg #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Call Trace:
   <TASK>
   show_stack+0x52/0x58
   dump_stack_lvl+0x5b/0x82
   dump_stack+0x10/0x12
   print_usage_bug.part.0+0x29c/0x2ab
   mark_lock_irq.cold+0x54/0xbf
   mark_lock.part.0+0x3f5/0xa70
   mark_usage+0x74/0x1a0
   __lock_acquire+0x45b/0xce0
   lock_acquire+0x18a/0x450
   _raw_spin_lock_irqsave+0x43/0x60
   rxe_pool_get_index+0x73/0x170 [rdma_rxe]
   rxe_get_av+0xcc/0x140 [rdma_rxe]
   rxe_requester+0x34c/0xe60 [rdma_rxe]
   rxe_do_task+0xcc/0x140 [rdma_rxe]
   tasklet_action_common.constprop.0+0x168/0x1b0
   tasklet_action+0x42/0x60
   __do_softirq+0x1d8/0x6e1
   run_ksoftirqd+0x37/0x60
   smpboot_thread_fn+0x302/0x410
   kthread+0x183/0x1c0
   ret_from_fork+0x1f/0x30
   </TASK>

Is this perhaps the same issue as what I reported on May 6 (https://lore.kernel.org/all/cf8b9980-3965-a4f6-07e0-d4b25755b0db@xxxxxxx/)?

Thanks,

Bart.

(from windows)

Yes. There is a lock level bug in rxe_pool.c that requires a patch to fix. I have one that is a temporary fix.
Zhu had one that he posted  while ago but was never accepted. I don't want to step on his toes.
This is related to the "AH bug" i.e. rdmacm holding locks while calling into the verbs APIs which is just plain evil.

Yes. This patch is not accepted. And it seems that all expect that this problem should be fixed in your rcu patch series.

Zhu Yanjun


I'll send you my patch.

Bob




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux