Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello Sagi
Against Bart's tree again

a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt

Above are all in
Added your most recent patch above

Same behavior.
[  579.368733] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817de9c57b0
[  579.369875] mlx5_1:dump_cqe:262:(pid 15140): dump error cqe
[  579.369877] 00000000 00000000 00000000 00000000
[  579.369877] 00000000 00000000 00000000 00000000
[  579.369878] 00000000 00000000 00000000 00000000
[  579.369878] 00000000 0f007806 2500002b 1c528dd0
[  579.369883] scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff88179a460af8
[  594.814222] scsi host1: ib_srp: reconnect succeeded
[  594.916876] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817e1d4a6b0
[  595.494532] mlx5_1:dump_cqe:262:(pid 15205): dump error cqe
[  595.525995] 00000000 00000000 00000000 00000000
[  595.552125] 00000000 00000000 00000000 00000000
[  595.578204] 00000000 00000000 00000000 00000000
[  595.603670] 00000000 0f007806 25000033 002d77d0
^C[  610.821911] scsi host1: ib_srp: reconnect succeeded
[  610.933298] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817e1d4a170
[  611.514234] mlx5_1:dump_cqe:262:(pid 15242): dump error cqe
[  611.543083] 00000000 00000000 00000000 00000000
[  611.568670] 00000000 00000000 00000000 00000000
[  611.594064] 00000000 00000000 00000000 00000000
[  611.620142] 00000000 0f007806 2500003b 003161d0

I will capture the function traces with your patch applied and the additional logging asked for by Max.

Thanks, that would be helpful,

Can you try the following patch, just to see if there is an off by 1 case:

--
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index b8f9382a8b7d..3d6ef7bce7d9 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1525,7 +1525,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 {
        struct mlx5_ib_dev *dev = to_mdev(pd->device);
        int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
-       int ndescs = ALIGN(max_num_sg, 4);
+       int ndescs = ALIGN(max_num_sg + 1, 4);
        struct mlx5_ib_mr *mr;
        void *mkc;
        u32 *in;
--

It's not a fix, but if it works it can give us a clue...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux