On 2021-10-12 6:07 p.m., Bart Van Assche wrote: > Hi, > > If I run the SRP tests against the for-next branch of the RDMA git tree > then the following warning appears (commit 2a152512a155 ("RDMA/efa: CQ > notifications")): > > ------------[ cut here ]------------ > WARNING: CPU: 69 PID: 838 at drivers/infiniband/core/rw.c:349 > rdma_rw_ctx_init+0x63b/0x690 [ib_core] > CPU: 69 PID: 838 Comm: kworker/69:1H Tainted: G E 5.15.0-rc4-dbg+ #2 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] > RIP: 0010:rdma_rw_ctx_init+0x63b/0x690 [ib_core] > Code: 8b 45 10 49 8d 7e 48 49 89 46 40 e8 cf 32 ca e0 8b 45 18 49 8d 7e > 04 41 89 46 48 e8 df 30 ca e0 41 c6 46 04 00 e9 61 fe ff ff <0f> 0b 41 > bc fb ff ff ff e9 3e fe ff ff 48 8b 9d 70 ff ff ff 48 8d > RSP: 0018:ffff88810b867968 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000024 RCX: dffffc0000000000 > RDX: 0000000000000000 RSI: ffff888169ee9a40 RDI: ffff888169ee9a58 > RBP: ffff88810b867a20 R08: ffffffffa081b01b R09: 0000000000000000 > R10: ffffed1085d2e3f1 R11: 0000000000000001 R12: 0000000000000000 > R13: 0000000000000000 R14: ffff888169ee9a58 R15: ffff888169ee9a40 > FS: 0000000000000000(0000) GS:ffff88842e940000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f4720169e88 CR3: 00000001895d9006 CR4: 0000000000770ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > srpt_alloc_rw_ctxs+0x2f2/0x560 [ib_srpt] > srpt_get_desc_tbl.constprop.0+0x289/0x2e0 [ib_srpt] > srpt_handle_cmd+0x17f/0x2b0 [ib_srpt] > srpt_handle_new_iu+0x27e/0x520 [ib_srpt] > srpt_recv_done+0x9b/0xd0 [ib_srpt] > __ib_process_cq+0x121/0x3d0 [ib_core] > ib_cq_poll_work+0x37/0xb0 [ib_core] > process_one_work+0x585/0xae0 > worker_thread+0x2e7/0x700 > kthread+0x1f6/0x220 > ret_from_fork+0x1f/0x30 > irq event stamp: 1255 > hardirqs last enabled at (1263): [<ffffffff811ab2c8>] > __up_console_sem+0x58/0x60 > hardirqs last disabled at (1270): [<ffffffff811ab2ad>] > __up_console_sem+0x3d/0x60 > softirqs last enabled at (1290): [<ffffffff82200473>] > __do_softirq+0x473/0x6ed > softirqs last disabled at (1279): [<ffffffff810e2152>] > __irq_exit_rcu+0xf2/0x140 > ---[ end trace 81a8636fba7e1a77 ]--- > > Does this perhaps indicate a regression in the RDMA rw code? Hmm, yes looks like a regression with my recent patch. Best I can see from the code is that someone is passing an sg_cnt of zero. Previously that would have returned -ENOMEM, but now it might be ignored, in which case it would hit that WARNING and return -EIO. We can try a patch such as below to confirm. Logan -- diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c index 5a3bd41b331c..4eb9781ccfaf 100644 --- a/drivers/infiniband/core/rw.c +++ b/drivers/infiniband/core/rw.c @@ -331,6 +331,10 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u3> return ret; sg_cnt = sgt.nents; + ret = -EIO; + if (!sg_cnt) + goto out_unmap_sg; + /* * Skip to the S/G entry that sg_offset falls into: */