Re: Kernel warning at drivers/infiniband/core/rw.c:349

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2021-10-12 6:07 p.m., Bart Van Assche wrote:
> Hi,
> 
> If I run the SRP tests against the for-next branch of the RDMA git tree
> then the following warning appears (commit 2a152512a155 ("RDMA/efa: CQ 
> notifications")):
> 
> ------------[ cut here ]------------
> WARNING: CPU: 69 PID: 838 at drivers/infiniband/core/rw.c:349 
> rdma_rw_ctx_init+0x63b/0x690 [ib_core]
> CPU: 69 PID: 838 Comm: kworker/69:1H Tainted: G    E   5.15.0-rc4-dbg+ #2
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> RIP: 0010:rdma_rw_ctx_init+0x63b/0x690 [ib_core]
> Code: 8b 45 10 49 8d 7e 48 49 89 46 40 e8 cf 32 ca e0 8b 45 18 49 8d 7e 
> 04 41 89 46 48 e8 df 30 ca e0 41 c6 46 04 00 e9 61 fe ff ff <0f> 0b 41 
> bc fb ff ff ff e9 3e fe ff ff 48 8b 9d 70 ff ff ff 48 8d
> RSP: 0018:ffff88810b867968 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000024 RCX: dffffc0000000000
> RDX: 0000000000000000 RSI: ffff888169ee9a40 RDI: ffff888169ee9a58
> RBP: ffff88810b867a20 R08: ffffffffa081b01b R09: 0000000000000000
> R10: ffffed1085d2e3f1 R11: 0000000000000001 R12: 0000000000000000
> R13: 0000000000000000 R14: ffff888169ee9a58 R15: ffff888169ee9a40
> FS:  0000000000000000(0000) GS:ffff88842e940000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4720169e88 CR3: 00000001895d9006 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>   srpt_alloc_rw_ctxs+0x2f2/0x560 [ib_srpt]
>   srpt_get_desc_tbl.constprop.0+0x289/0x2e0 [ib_srpt]
>   srpt_handle_cmd+0x17f/0x2b0 [ib_srpt]
>   srpt_handle_new_iu+0x27e/0x520 [ib_srpt]
>   srpt_recv_done+0x9b/0xd0 [ib_srpt]
>   __ib_process_cq+0x121/0x3d0 [ib_core]
>   ib_cq_poll_work+0x37/0xb0 [ib_core]
>   process_one_work+0x585/0xae0
>   worker_thread+0x2e7/0x700
>   kthread+0x1f6/0x220
>   ret_from_fork+0x1f/0x30
> irq event stamp: 1255
> hardirqs last  enabled at (1263): [<ffffffff811ab2c8>] 
> __up_console_sem+0x58/0x60
> hardirqs last disabled at (1270): [<ffffffff811ab2ad>] 
> __up_console_sem+0x3d/0x60
> softirqs last  enabled at (1290): [<ffffffff82200473>] 
> __do_softirq+0x473/0x6ed
> softirqs last disabled at (1279): [<ffffffff810e2152>] 
> __irq_exit_rcu+0xf2/0x140
> ---[ end trace 81a8636fba7e1a77 ]---
> 
> Does this perhaps indicate a regression in the RDMA rw code?

Hmm, yes looks like a regression with my recent patch.

Best I can see from the code is that someone is passing an sg_cnt of
zero. Previously that would have returned -ENOMEM, but now it might be
ignored, in which case it would hit that WARNING and return -EIO.

We can try a patch such as below to confirm.

Logan

--

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 5a3bd41b331c..4eb9781ccfaf 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -331,6 +331,10 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx,
struct ib_qp *qp, u3>
                return ret;
        sg_cnt = sgt.nents;

+       ret = -EIO;
+       if (!sg_cnt)
+               goto out_unmap_sg;
+
        /*
         * Skip to the S/G entry that sg_offset falls into:
         */




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux