Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "Sagi Grimberg" <sagi@xxxxxxxxxxx>
> To: "Laurence Oberman" <loberman@xxxxxxxxxx>
> Cc: "Leon Romanovsky" <leonro@xxxxxxxxxxxx>, "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>, "Doug Ledford"
> <dledford@xxxxxxxxxx>, "Max Gurtovoy" <maxg@xxxxxxxxxxxx>, "Israel Rukshin" <israelr@xxxxxxxxxxxx>,
> linux-rdma@xxxxxxxxxxxxxxx
> Sent: Wednesday, May 3, 2017 10:58:43 AM
> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> 
> 
> > Hello Sagi
> > Against Bart's tree again
> >
> > a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> > dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> > f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> >
> > Above are all in
> > Added your most recent patch above
> >
> > Same behavior.
> > [  579.368733] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817de9c57b0
> > [  579.369875] mlx5_1:dump_cqe:262:(pid 15140): dump error cqe
> > [  579.369877] 00000000 00000000 00000000 00000000
> > [  579.369877] 00000000 00000000 00000000 00000000
> > [  579.369878] 00000000 00000000 00000000 00000000
> > [  579.369878] 00000000 0f007806 2500002b 1c528dd0
> > [  579.369883] scsi host1: ib_srp: failed FAST REG status memory management
> > operation error (6) for CQE ffff88179a460af8
> > [  594.814222] scsi host1: ib_srp: reconnect succeeded
> > [  594.916876] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817e1d4a6b0
> > [  595.494532] mlx5_1:dump_cqe:262:(pid 15205): dump error cqe
> > [  595.525995] 00000000 00000000 00000000 00000000
> > [  595.552125] 00000000 00000000 00000000 00000000
> > [  595.578204] 00000000 00000000 00000000 00000000
> > [  595.603670] 00000000 0f007806 25000033 002d77d0
> > ^C[  610.821911] scsi host1: ib_srp: reconnect succeeded
> > [  610.933298] scsi host1: ib_srp: failed RECV status WR flushed (5) for
> > CQE ffff8817e1d4a170
> > [  611.514234] mlx5_1:dump_cqe:262:(pid 15242): dump error cqe
> > [  611.543083] 00000000 00000000 00000000 00000000
> > [  611.568670] 00000000 00000000 00000000 00000000
> > [  611.594064] 00000000 00000000 00000000 00000000
> > [  611.620142] 00000000 0f007806 2500003b 003161d0
> >
> > I will capture the function traces with your patch applied and the
> > additional logging asked for by Max.
> 
> Thanks, that would be helpful,
> 
> Can you try the following patch, just to see if there is an off by 1 case:
> 
> --
> diff --git a/drivers/infiniband/hw/mlx5/mr.c
> b/drivers/infiniband/hw/mlx5/mr.c
> index b8f9382a8b7d..3d6ef7bce7d9 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -1525,7 +1525,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
>   {
>          struct mlx5_ib_dev *dev = to_mdev(pd->device);
>          int inlen = MLX5_ST_SZ_BYTES(create_mkey_in);
> -       int ndescs = ALIGN(max_num_sg, 4);
> +       int ndescs = ALIGN(max_num_sg + 1, 4);
>          struct mlx5_ib_mr *mr;
>          void *mkc;
>          u32 *in;
> --
> 
> It's not a fix, but if it works it can give us a clue...
> 

Sorry, been delayed this week, will get this done this weekend.
Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux