Re: Are device drivers ready for max_send_sge ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 30, 2018 at 02:03:39PM -0400, Chuck Lever wrote:
>
>
> > On Aug 30, 2018, at 3:16 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > On Wed, Aug 29, 2018 at 04:41:42PM -0400, Chuck Lever wrote:
> >>
> >>
> >>> On Aug 29, 2018, at 2:52 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>>
> >>>
> >>>
> >>>> On Aug 29, 2018, at 2:17 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >>>>> Sent: Wednesday, August 29, 2018 1:09 PM
> >>>>> To: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>
> >>>>> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>; Leon Romanovsky
> >>>>> <leon@xxxxxxxxxx>
> >>>>> Subject: Re: Are device drivers ready for max_send_sge ?
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Aug 29, 2018, at 1:50 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hey Chuck,
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
> >>>>>>> owner@xxxxxxxxxxxxxxx> On Behalf Of Chuck Lever
> >>>>>>> Sent: Wednesday, August 29, 2018 11:49 AM
> >>>>>>> To: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>
> >>>>>>> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Steve Wise
> >>>>>>> <swise@xxxxxxxxxxxxxxxxxxxxx>
> >>>>>>> Subject: Are device drivers ready for max_send_sge ?
> >>>>>>>
> >>>>>>> In v4.19-rc1, the NFS/RDMA server has stopped working
> >>>>>>> for me.
> >>>>>>>
> >>>>>>> The reason for this is that the mlx4 driver (with CX-3)
> >>>>>>> reports 62 in
> >>>>>>>
> >>>>>>> ib_device_attr::max_send_sge
> >>>>>>>
> >>>>>>> But when the NFS server tries to create a QP with a
> >>>>>>> qp_attr.cap.max_send_sge set to 62, rdma_create_qp
> >>>>>>> fails with -EINVAL. The check that fails is in
> >>>>>>>
> >>>>>>> drivers/infiniband/hw/mlx4/qp.c :: set_kernel_sq_size
> >>>>>>>
> >>>>>>> It's comparing the passed-in max_send_sge against the min()
> >>>>>>> of dev->dev->caps.max_sq_sg and dev->dev->caps.max_rq_sg,
> >>>>>>> and obviously that fails because max_rq_sg is smaller than
> >>>>>>> max_sq_sg.
> >>>>>>>
> >>>>>>> set_rq_size() also has similar dependencies on max_sq_sg
> >>>>>>> that may no longer be appropriate.
> >>>>>>>
> >>>>>>
> >>>>>> These need fixing for sure.
> >>>>>>
> >>>>>>> When I fix the first sanity check in set_kernel_sq_size
> >>>>>>> to ignore max_rq_sg, the third check in set_kernel_sq_size
> >>>>>>> fails. This is:
> >>>>>>>
> >>>>>>> s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg),
> >>>>>>>       cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) +
> >>>>>>>       send_wqe_overhead(type, qp->flags);
> >>>>>>>
> >>>>>>> if (s > dev->dev->caps.max_sq_desc_sz)
> >>>>>>>
> >>>>>>> I don't know enough about this logic to suggest a fix.
> >>>>>>>
> >>>>>>
> >>>>>> If you set ib_device_attr::max_send_sge to 61, does it then work?  Just
> >>>>>> wonder if this is a 1-off error.
> >>>>>
> >>>>> I made this change to reduce ib_device_attr::max_send_sge
> >>>>> to 61:
> >>>>>
> >>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c
> >>>>> b/drivers/net/ethernet/mellanox/mlx4/fw.c
> >>>>> index babcfd9..81874f6 100644
> >>>>> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c
> >>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
> >>>>> @@ -950,7 +950,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev,
> >>>>> struct mlx4_dev_cap *dev_cap)
> >>>>>      }
> >>>>>
> >>>>>      MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_SG_SQ_OFFSET);
> >>>>> -       dev_cap->max_sq_sg = field;
> >>>>> +       dev_cap->max_sq_sg = field - 1;
> >>>>>      MLX4_GET(size, outbox, QUERY_DEV_CAP_MAX_DESC_SZ_SQ_OFFSET);
> >>>>>      dev_cap->max_sq_desc_sz = size;
> >>>>>
> >>>>>
> >>>>> I added printk's to set_kernel_sq_size to capture the parameters
> >>>>> of the failing sanity check.
> >>>>>
> >>>>> Here's with max_send_sge == 62:
> >>>>>
> >>>>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1056
> >>>>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev-
> >>>>>> caps.max_sq_desc_sz = 1008
> >>>>>
> >>>>> and with max_send_sge == 61:
> >>>>>
> >>>>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1040
> >>>>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev-
> >>>>>> caps.max_sq_desc_sz = 1008
> >>>>>
> >>>>
> >>>> I guess mlx4_ib_query_device() could be enhanced to limit the max:
> >>>>
> >>>> ib_device_attr::max_send_sge = min(fw-advertised-max, (max_sq_desc_space -
> >>>> max_wqe_overhead) / sge_descriptor_size);
> >>>
> >>> Probably. I also see that the NFS server has no reason to
> >>> request so many send SGEs.
> >>
> >> I've now tested a couple other cards/drivers. mlx4/CX-3 is the
> >> only one with this problem I've found so far.
> >
> > Thanks Chuck and Steve,
> >
> > I checked it in our mlx4 HW spec and it seems that the check in
> > set_kernel_sq_size() is the right one. We should take into account
> > the following four maximum parameters: max_sg_rq, max_sg_sq,
> > max_desc_sz_rq and max_desc_sz_sq.
> >
> > However instead of bringing this complexity to query_device, which still
> > won't be sufficient anyway (the calculations are dependent on QP type),
> > the safer approach will be to restore old code, which will give you 32 SGEs.
>
> I'm OK with the "restore legacy behavior" approach. I assume you
> need to pass this through your testing regimen, so I will wait
> for you to merge it.

Thanks, It is part of our regression. I'll send formal patch on Sunday.

>
> I have a patch that limits the NFS server to just enough Send SGEs
> rather than using the device maximum. I plan to submit that during
> the v4.20 merge window, unless folks would like me to send it sooner.
>
>
> > Thanks
> >
> > diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
> > index ca0f1ee26091..21e268247f4d 100644
> > --- a/drivers/infiniband/hw/mlx4/main.c
> > +++ b/drivers/infiniband/hw/mlx4/main.c
> > @@ -517,8 +517,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
> > 	props->page_size_cap	   = dev->dev->caps.page_size_cap;
> > 	props->max_qp		   = dev->dev->quotas.qp;
> > 	props->max_qp_wr	   = dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE;
> > -	props->max_send_sge	   = dev->dev->caps.max_sq_sg;
> > -	props->max_recv_sge	   = dev->dev->caps.max_rq_sg;
> > +	props->max_send_sge	   = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg);
> > +	props->max_recv_sge	   = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg);
> > 	props->max_sge_rd	   = MLX4_MAX_SGE_RD;
> > 	props->max_cq		   = dev->dev->quotas.cq;
> > 	props->max_cqe		   = dev->dev->caps.max_cqes;
> >
> >>
> >>
> >> --
> >> Chuck Lever
>
> --
> Chuck Lever
>
>
>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux