----- Original Message ----- > From: "Sagi Grimberg" <sagi@xxxxxxxxxxx> > To: "Laurence Oberman" <loberman@xxxxxxxxxx> > Cc: "Leon Romanovsky" <leonro@xxxxxxxxxxxx>, "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>, "Doug Ledford" > <dledford@xxxxxxxxxx>, "Max Gurtovoy" <maxg@xxxxxxxxxxxx>, "Israel Rukshin" <israelr@xxxxxxxxxxxx>, > linux-rdma@xxxxxxxxxxxxxxx > Sent: Wednesday, May 3, 2017 10:58:43 AM > Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > > > > Hello Sagi > > Against Bart's tree again > > > > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS > > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array > > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt > > > > Above are all in > > Added your most recent patch above > > > > Same behavior. > > [ 579.368733] scsi host1: ib_srp: failed RECV status WR flushed (5) for > > CQE ffff8817de9c57b0 > > [ 579.369875] mlx5_1:dump_cqe:262:(pid 15140): dump error cqe > > [ 579.369877] 00000000 00000000 00000000 00000000 > > [ 579.369877] 00000000 00000000 00000000 00000000 > > [ 579.369878] 00000000 00000000 00000000 00000000 > > [ 579.369878] 00000000 0f007806 2500002b 1c528dd0 > > [ 579.369883] scsi host1: ib_srp: failed FAST REG status memory management > > operation error (6) for CQE ffff88179a460af8 > > [ 594.814222] scsi host1: ib_srp: reconnect succeeded > > [ 594.916876] scsi host1: ib_srp: failed RECV status WR flushed (5) for > > CQE ffff8817e1d4a6b0 > > [ 595.494532] mlx5_1:dump_cqe:262:(pid 15205): dump error cqe > > [ 595.525995] 00000000 00000000 00000000 00000000 > > [ 595.552125] 00000000 00000000 00000000 00000000 > > [ 595.578204] 00000000 00000000 00000000 00000000 > > [ 595.603670] 00000000 0f007806 25000033 002d77d0 > > ^C[ 610.821911] scsi host1: ib_srp: reconnect succeeded > > [ 610.933298] scsi host1: ib_srp: failed RECV status WR flushed (5) for > > CQE ffff8817e1d4a170 > > [ 611.514234] mlx5_1:dump_cqe:262:(pid 15242): dump error cqe > > [ 611.543083] 00000000 00000000 00000000 00000000 > > [ 611.568670] 00000000 00000000 00000000 00000000 > > [ 611.594064] 00000000 00000000 00000000 00000000 > > [ 611.620142] 00000000 0f007806 2500003b 003161d0 > > > > I will capture the function traces with your patch applied and the > > additional logging asked for by Max. > > Thanks, that would be helpful, > > Can you try the following patch, just to see if there is an off by 1 case: > > -- > diff --git a/drivers/infiniband/hw/mlx5/mr.c > b/drivers/infiniband/hw/mlx5/mr.c > index b8f9382a8b7d..3d6ef7bce7d9 100644 > --- a/drivers/infiniband/hw/mlx5/mr.c > +++ b/drivers/infiniband/hw/mlx5/mr.c > @@ -1525,7 +1525,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, > { > struct mlx5_ib_dev *dev = to_mdev(pd->device); > int inlen = MLX5_ST_SZ_BYTES(create_mkey_in); > - int ndescs = ALIGN(max_num_sg, 4); > + int ndescs = ALIGN(max_num_sg + 1, 4); > struct mlx5_ib_mr *mr; > void *mkc; > u32 *in; > -- > > It's not a fix, but if it works it can give us a clue... > Sorry, been delayed this week, will get this done this weekend. Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html