You can reproduce the error by running the pyverbs test suite on for-next with rdmacm enabled and lock checking configured. Both of these are tricky. You will also see a lockdep warning but that is for a different test case. 1. The pyverbs test suite has a bug that prevents rxe from attempting several test cases that depend on rdmacm for connection setup. I always apply the following git patch which is a hack but enables these to run diff --git a/tests/base.py b/tests/base.py index 93568c7f..03b5601e 100644 --- a/tests/base.py +++ b/tests/base.py @@ -245,10 +245,11 @@ class RDMATestCase(unittest.TestCase): self._add_gids_per_port(ctx, dev, self.ib_port) def _get_ip_mac(self, dev, port, idx): - if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)): - self.args.append([dev, port, idx, None, None]) - return - net_name = self.get_net_name(dev) + #if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)): + #self.args.append([dev, port, idx, None, None]) + #return + #net_name = self.get_net_name(dev) + net_name = "enp0s3" try: ip_addr, mac_addr = self.get_ip_mac_address(net_name) except (KeyError, IndexError): This test on /sys/class/infiniband fails because the rxe driver does not have a device/net entry. I just provide the device name of my ethernet adapter (enp0s3). I reported this to the maintainer of the pyverbs tests months ago but he never responded. 2. The following diff patch shows the locking tools I have configured 11363,11364c11363 < CONFIG_PROVE_LOCKING=y < CONFIG_PROVE_RAW_LOCK_NESTING=y --- > # CONFIG_PROVE_LOCKING is not set 11366,11379c11365,11371 < CONFIG_DEBUG_RT_MUTEXES=y < CONFIG_DEBUG_SPINLOCK=y < CONFIG_DEBUG_MUTEXES=y < CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y < CONFIG_DEBUG_RWSEMS=y < CONFIG_DEBUG_LOCK_ALLOC=y < CONFIG_LOCKDEP=y < CONFIG_LOCKDEP_BITS=15 < CONFIG_LOCKDEP_CHAINS_BITS=16 < CONFIG_LOCKDEP_STACK_TRACE_BITS=19 < CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14 < CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12 < CONFIG_DEBUG_LOCKDEP=y < CONFIG_DEBUG_ATOMIC_SLEEP=y --- > # CONFIG_DEBUG_RT_MUTEXES is not set > # CONFIG_DEBUG_SPINLOCK is not set > # CONFIG_DEBUG_MUTEXES is not set > # CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set > # CONFIG_DEBUG_RWSEMS is not set > # CONFIG_DEBUG_LOCK_ALLOC is not set > # CONFIG_DEBUG_ATOMIC_SLEEP is not set 11385,11386d11376 < CONFIG_PROVE_NVDIMM_LOCKING=y < # CONFIG_PROVE_CXL_LOCKING is not set 11389,11391c11379 < CONFIG_TRACE_IRQFLAGS=y < CONFIG_TRACE_IRQFLAGS_NMI=y < CONFIG_DEBUG_IRQFLAGS=y --- > # CONFIG_DEBUG_IRQFLAGS is not set 11411d11398 < CONFIG_PROVE_RCU=y 11445d11431 < CONFIG_PREEMPTIRQ_TRACEPOINTS=y I think the one that exhibits this warning is one of CONFIG_TRACE_IRQFLAGS=y Unfortunately it only prints out the warning once per reboot as far as I can tell. Reloading the rxe driver won't do it. Once all this is in place just type sudo ./build/bin/run_tests.py --dev rxe0 The warning output is below. Since two days ago it has been determined that the cause of the warning is calling dev_mc_add() inside a spin_lock_irqsave()/_irqrestore(). Bob -------- Forwarded Message -------- Subject: Re: [PATCH for-next] RDMA/rxe: Fix "RDMA/rxe: Cleanup rxe_mcast.c" Date: Tue, 3 May 2022 23:39:36 -0500 From: Bob Pearson <rpearsonhpe@xxxxxxxxx> To: Jason Gunthorpe <jgg@xxxxxxxxxx> On 5/3/22 23:22, Bob Pearson wrote: > On 5/3/22 19:40, Jason Gunthorpe wrote: >> On Tue, May 03, 2022 at 05:40:50PM -0500, Bob Pearson wrote: >>> On 5/3/22 06:38, Jason Gunthorpe wrote: >>>> On Wed, Apr 13, 2022 at 12:29:38AM -0500, Bob Pearson wrote: >>>>> rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock >>>>> while rxe_recv.c uses _bh spinlocks for the same lock. Adding >>>>> tracing shows that the code in rxe_mcast.c is all executed in_task() >>>>> context while the code in rxe_recv.c that refers to the lock >>>>> executes in softirq context. This causes a lockdep warning in code >>>>> that executes multicast I/O including blktests check srp. >>>> >>>> What is the lockdep warning? This patch looks OK, but irqsave should >>>> be a universal lock - so why does lockdep complain? Is there nesting? >>>> >>>> Jason >>> >>> I am bad at parsing them but basically just complaining that you can't >>> mix _irq and _bh locks I believe. >> >> irqsave is a superset of bh within a process context, so lockdep >> should not complain about just that. >> >>> I had had _irqsave locks in rxe_mcast.c which is process context and >>> _bh locks in rxe_recv.c which is softirq context (NAPI). >> >> Yes, I see that, but that alone should be OK. >> >> So lockdep is pointing at something else becoming involved in this.. >> >>> There isn't any hard irq context for this lock. If you want to see >>> it I can recompile with lockdep and run the python tests again. >> >> Please - the lockdep report should be pasted into the commit message. >> >> This patch doesn't make sense alone, although it seems correct from >> what I can tell. >> >> Jason > > This is the warning. I assumed it was lockdep but it seems it's something else. > > [ 159.430321] ------------[ cut here ]------------ > [ 159.430326] raw_local_irq_restore() called with IRQs enabled > [ 159.430334] WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50 > [ 159.430341] Modules linked in: rdma_ucm(E) rdma_cm(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ib_uverbs(E) ip6_udp_tunnel udp_tunnel ib_core(E) isofs vboxvideo drm_vram_helper vboxsf snd_intel8x0 snd_ac97_codec nls_iso8859_1 ac97_bus intel_rapl_msr snd_pcm intel_rapl_common snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq crct10dif_pclmul ghash_clmulni_intel snd_seq_device aesni_intel crypto_simd snd_timer joydev cryptd input_leds snd serio_raw vboxguest soundcore mac_hid sch_fq_codel vmwgfx drm_ttm_helper ttm drm_kms_helper fb_sys_fops syscopyarea sysfillrect sysimgblt ipmi_devintf ipmi_msghandler dm_multipath scsi_dh_rdac drm scsi_dh_emc scsi_dh_alua msr parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid psmouse crc32_pclmul e1000 ahci i2c_piix4 libahci pata_acpi video > [ 159.430391] CPU: 13 PID: 3107 Comm: python3 Tainted: G E 5.18.0-rc1+ #7 > [ 159.430395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 > [ 159.430397] RIP: 0010:warn_bogus_irq_restore+0x2f/0x50 > [ 159.430400] Code: 0f b6 1d af af 05 01 80 fb 01 77 26 83 e3 01 74 06 48 8b 5d f8 c9 c3 48 c7 c7 00 2c de 86 c6 05 91 af 05 01 01 e8 66 0d f2 ff <0f> 0b 48 8b 5d f8 c9 c3 0f b6 f3 48 c7 c7 90 d4 0b 87 e8 fa d6 84 > [ 159.430403] RSP: 0018:ffffb8dd054db9e8 EFLAGS: 00010282 > [ 159.430405] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 159.430406] RDX: 0000000000000002 RSI: ffffffff86df36a1 RDI: 00000000ffffffff > [ 159.430408] RBP: ffffb8dd054db9f0 R08: 0000000000000003 R09: 0000000000000000 > [ 159.430409] R10: 65725f7172695f6c R11: 0000000000000000 R12: ffff9524680f56c8 > [ 159.430410] R13: ffff95240c5fde40 R14: ffff952467e81000 R15: ffff95240c5fde40 > [ 159.430411] FS: 00007f2eb2f23000(0000) GS:ffff952517880000(0000) knlGS:0000000000000000 > [ 159.430413] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 159.430414] CR2: 00007f2eb0bb3a00 CR3: 0000000122d6e000 CR4: 00000000000506e0 > [ 159.430418] Call Trace: > [ 159.430420] <TASK> > [ 159.430423] _raw_spin_unlock_irqrestore+0x75/0x80 > [ 159.430427] rxe_attach_mcast+0x304/0x480 [rdma_rxe] > [ 159.430437] ib_attach_mcast+0x88/0xa0 [ib_core] > [ 159.430452] ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs] > [ 159.430459] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs] > [ 159.430465] ? _copy_from_user+0x7f/0xb0 > [ 159.430474] ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs] > [ 159.430480] ? rcu_read_lock_sched_held+0x16/0x80 > [ 159.430484] ? lock_release+0x23d/0x330 > [ 159.430487] ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0xe0/0xe0 [ib_uverbs] > [ 159.430496] ? lock_release+0x23d/0x330 > [ 159.430501] ? slab_free_freelist_hook.constprop.0+0x8e/0x180 > [ 159.430505] ? lock_acquire+0x1bd/0x330 > [ 159.430507] ? rcu_read_lock_sched_held+0x16/0x80 > [ 159.430509] ? rcu_read_lock_sched_held+0x16/0x80 > [ 159.430511] ? lock_acquire+0x1bd/0x330 > [ 159.430514] ? __might_fault+0x77/0x80 > [ 159.430517] ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs] > [ 159.430522] ? ib_uverbs_ioctl+0x8e/0x160 [ib_uverbs] > [ 159.430528] __x64_sys_ioctl+0x91/0xc0 > [ 159.430530] do_syscall_64+0x5c/0x80 > [ 159.430533] ? syscall_exit_to_user_mode+0x3b/0x50 > [ 159.430535] ? do_syscall_64+0x69/0x80 > [ 159.430537] ? syscall_exit_to_user_mode+0x3b/0x50 > [ 159.430540] ? do_syscall_64+0x69/0x80 > [ 159.430541] ? irqentry_exit+0x63/0x90 > [ 159.430544] ? exc_page_fault+0xa8/0x2f0 > [ 159.430546] ? asm_exc_page_fault+0x8/0x30 > [ 159.430548] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 159.430549] RIP: 0033:0x7f2eb2d1aaff > [ 159.430551] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 > [ 159.430556] RSP: 002b:00007ffd3f260c60 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 159.430558] RAX: ffffffffffffffda RBX: 00007ffd3f260da8 RCX: 00007f2eb2d1aaff > [ 159.430559] RDX: 00007ffd3f260d90 RSI: 00000000c0181b01 RDI: 000000000000003b > [ 159.430561] RBP: 00007ffd3f260d70 R08: 000055c046934e00 R09: 0000000000000000 > [ 159.430562] R10: 000000000000001e R11: 0000000000000246 R12: 0000000000000000 > [ 159.430563] R13: 0000000000000000 R14: 0000000000000000 R15: 000055c046934f40 > [ 159.430567] </TASK> > [ 159.430568] irq event stamp: 0 > [ 159.430569] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 159.430571] hardirqs last disabled at (0): [<ffffffff856bce5d>] copy_process+0x9dd/0x1f40 > [ 159.430574] softirqs last enabled at (0): [<ffffffff856bce5d>] copy_process+0x9dd/0x1f40 > [ 159.430575] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 159.430577] ---[ end trace 0000000000000000 ]--- > > The code for rxe_attach_mcast is > > int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) > > { > > int err; > > struct rxe_dev *rxe = to_rdev(ibqp->device); > > struct rxe_qp *qp = to_rqp(ibqp); > > struct rxe_mcg *mcg; > > > > /* takes a ref on mcg if successful */ > > mcg = rxe_get_mcg(rxe, mgid); > > if (IS_ERR(mcg)) > > return PTR_ERR(mcg); > > > > err = rxe_attach_mcg(mcg, qp); > > > > /* if we failed to attach the first qp to mcg tear it down */ > > if (atomic_read(&mcg->qp_num) == 0) > > rxe_destroy_mcg(mcg); > > > > kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); > > > > return err; > > } > > which calls static function rxe_attach_mcg() which is > > static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) > > { > > struct rxe_dev *rxe = mcg->rxe; > > struct rxe_mca *mca, *tmp; > > unsigned long flags; > > int err; > > > > /* check to see if the qp is already a member of the group */ > > spin_lock_irqsave(&rxe->mcg_lock, flags); > > list_for_each_entry(mca, &mcg->qp_list, qp_list) { > > if (mca->qp == qp) { > > spin_unlock_irqrestore(&rxe->mcg_lock, flags); > > return 0; > > } > > } > > spin_unlock_irqrestore(&rxe->mcg_lock, flags); > > > > /* speculative alloc new mca without using GFP_ATOMIC */ > > mca = kzalloc(sizeof(*mca), GFP_KERNEL); > > if (!mca) > > return -ENOMEM; > > > > spin_lock_irqsave(&rxe->mcg_lock, flags); > > /* re-check to see if someone else just attached qp */ > > list_for_each_entry(tmp, &mcg->qp_list, qp_list) { > > if (tmp->qp == qp) { > > kfree(mca); > > err = 0; > > goto out; > > } > > } > > > > err = __rxe_init_mca(qp, mcg, mca); > > if (err) > > kfree(mca); > > out: > > spin_unlock_irqrestore(&rxe->mcg_lock, flags); > > return err; > > } > > which has the offending spin_unlock_irqrestore apparently inline. > > I replace the locks with _bh locks the warning goes away. > > Bob I looked up the error message and the commit that added the check claims to be looking for irq_save/irq_restore sequences that enable some irqs in between them. I sure don't see anything that looks like that. Bob