----- Original Message ----- > From: "Sagi Grimberg" <sagi@xxxxxxxxxxx> > To: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>, "Doug Ledford" <dledford@xxxxxxxxxx> > Cc: linux-rdma@xxxxxxxxxxxxxxx, "Israel Rukshin" <israelr@xxxxxxxxxxxx>, "Max Gurtovoy" <maxg@xxxxxxxxxxxx>, "Leon > Romanovsky" <leonro@xxxxxxxxxxxx>, "Mark Bloch" <markb@xxxxxxxxxxxx>, "Yuval Shaia" <yuval.shaia@xxxxxxxxxx>, "# 4 . > 7+" <stable@xxxxxxxxxxxxxxx> > Sent: Wednesday, February 15, 2017 10:38:06 AM > Subject: Re: [PATCH v2 1/8] IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS > > > > Tests have shown that the following error message is reported when > > using SG-GAPS registration with an mlx5 adapter: > > > > scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE > > ffff880bd4270eb0 > > 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 00000000 > > 00000000 0f007806 2500002a ad9fafd1 > > scsi host1: ib_srp: reconnect succeeded > > mlx5_0:dump_cqe:262:(pid 7369): dump error cqe > > 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 00000000 > > 00000000 0f007806 25000032 00105dd0 > > scsi host1: ib_srp: failed FAST REG status memory management operation > > error (6) for CQE ffff880b92860138 > > > > Hence avoid using SG-GAPS memory registrations. Additionally, > > always configure the blk_queue_virt_boundary() to avoid to trigger > > a mapping failure when using adapters that support SG-GAPS (e.g. > > mlx5). > > Hi Guys, > > Sorry for addressing this late, but has this failure been investigated? > > Max, Israel, what does this error syndrome map to? > > Looking at mlx5_ib_sg_to_klms, I think the mr->length is incorrectly > incremented. Does the following change fix the problem? > -- > diff --git a/drivers/infiniband/hw/mlx5/mr.c > b/drivers/infiniband/hw/mlx5/mr.c > index 8f608debe141..c21c9eee37f6 100644 > --- a/drivers/infiniband/hw/mlx5/mr.c > +++ b/drivers/infiniband/hw/mlx5/mr.c > @@ -1832,7 +1832,7 @@ mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, > klms[i].va = cpu_to_be64(sg_dma_address(sg) + sg_offset); > klms[i].bcount = cpu_to_be32(sg_dma_len(sg) - sg_offset); > klms[i].key = cpu_to_be32(lkey); > - mr->ibmr.length += sg_dma_len(sg); > + mr->ibmr.length += sg_dma_len(sg) - sg_offset; > > sg_offset = 0; > } > -- > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Started with Linus's tree, applied the change requested by Sagi, built the kernel, rebooted and started the tests. Linux ibclient 4.10.0-rc8.sagi+ #1 SMP Wed Feb 15 11:09:44 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Very quickly get to this [ 180.990285] mlx5_0:dump_cqe:262:(pid 0): dump error cqe [ 181.016899] 00000000 00000000 00000000 00000000 [ 181.040949] 00000000 00000000 00000000 00000000 [ 181.066960] 00000000 00000000 00000000 00000000 [ 181.092030] 00000000 0f007806 2500002a bf1913d0 [ 181.117254] scsi host2: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880bdbe88778 [ 196.288933] fast_io_fail_tmo expired for SRP port-2:1 / host2. [ 197.090886] scsi host2: ib_srp: reconnect succeeded [ 197.127628] scsi host2: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817f09b6f30 So does not help. I think my and Barts suggestion to revert for now is the best way forward. I have already tested this in-depth from Bart's tree and its been sent to Doug as V2 of Bart'recent 8 patch series. Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html