On Thu, Aug 30, 2018 at 02:03:39PM -0400, Chuck Lever wrote: > > > > On Aug 30, 2018, at 3:16 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > On Wed, Aug 29, 2018 at 04:41:42PM -0400, Chuck Lever wrote: > >> > >> > >>> On Aug 29, 2018, at 2:52 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >>> > >>> > >>> > >>>> On Aug 29, 2018, at 2:17 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx> > >>>>> Sent: Wednesday, August 29, 2018 1:09 PM > >>>>> To: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> > >>>>> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>; Leon Romanovsky > >>>>> <leon@xxxxxxxxxx> > >>>>> Subject: Re: Are device drivers ready for max_send_sge ? > >>>>> > >>>>> > >>>>> > >>>>>> On Aug 29, 2018, at 1:50 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> > >>>>> wrote: > >>>>>> > >>>>>> Hey Chuck, > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > >>>>>>> owner@xxxxxxxxxxxxxxx> On Behalf Of Chuck Lever > >>>>>>> Sent: Wednesday, August 29, 2018 11:49 AM > >>>>>>> To: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx> > >>>>>>> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Steve Wise > >>>>>>> <swise@xxxxxxxxxxxxxxxxxxxxx> > >>>>>>> Subject: Are device drivers ready for max_send_sge ? > >>>>>>> > >>>>>>> In v4.19-rc1, the NFS/RDMA server has stopped working > >>>>>>> for me. > >>>>>>> > >>>>>>> The reason for this is that the mlx4 driver (with CX-3) > >>>>>>> reports 62 in > >>>>>>> > >>>>>>> ib_device_attr::max_send_sge > >>>>>>> > >>>>>>> But when the NFS server tries to create a QP with a > >>>>>>> qp_attr.cap.max_send_sge set to 62, rdma_create_qp > >>>>>>> fails with -EINVAL. The check that fails is in > >>>>>>> > >>>>>>> drivers/infiniband/hw/mlx4/qp.c :: set_kernel_sq_size > >>>>>>> > >>>>>>> It's comparing the passed-in max_send_sge against the min() > >>>>>>> of dev->dev->caps.max_sq_sg and dev->dev->caps.max_rq_sg, > >>>>>>> and obviously that fails because max_rq_sg is smaller than > >>>>>>> max_sq_sg. > >>>>>>> > >>>>>>> set_rq_size() also has similar dependencies on max_sq_sg > >>>>>>> that may no longer be appropriate. > >>>>>>> > >>>>>> > >>>>>> These need fixing for sure. > >>>>>> > >>>>>>> When I fix the first sanity check in set_kernel_sq_size > >>>>>>> to ignore max_rq_sg, the third check in set_kernel_sq_size > >>>>>>> fails. This is: > >>>>>>> > >>>>>>> s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), > >>>>>>> cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + > >>>>>>> send_wqe_overhead(type, qp->flags); > >>>>>>> > >>>>>>> if (s > dev->dev->caps.max_sq_desc_sz) > >>>>>>> > >>>>>>> I don't know enough about this logic to suggest a fix. > >>>>>>> > >>>>>> > >>>>>> If you set ib_device_attr::max_send_sge to 61, does it then work? Just > >>>>>> wonder if this is a 1-off error. > >>>>> > >>>>> I made this change to reduce ib_device_attr::max_send_sge > >>>>> to 61: > >>>>> > >>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c > >>>>> b/drivers/net/ethernet/mellanox/mlx4/fw.c > >>>>> index babcfd9..81874f6 100644 > >>>>> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c > >>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c > >>>>> @@ -950,7 +950,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, > >>>>> struct mlx4_dev_cap *dev_cap) > >>>>> } > >>>>> > >>>>> MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_SG_SQ_OFFSET); > >>>>> - dev_cap->max_sq_sg = field; > >>>>> + dev_cap->max_sq_sg = field - 1; > >>>>> MLX4_GET(size, outbox, QUERY_DEV_CAP_MAX_DESC_SZ_SQ_OFFSET); > >>>>> dev_cap->max_sq_desc_sz = size; > >>>>> > >>>>> > >>>>> I added printk's to set_kernel_sq_size to capture the parameters > >>>>> of the failing sanity check. > >>>>> > >>>>> Here's with max_send_sge == 62: > >>>>> > >>>>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1056 > >>>>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev- > >>>>>> caps.max_sq_desc_sz = 1008 > >>>>> > >>>>> and with max_send_sge == 61: > >>>>> > >>>>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1040 > >>>>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev- > >>>>>> caps.max_sq_desc_sz = 1008 > >>>>> > >>>> > >>>> I guess mlx4_ib_query_device() could be enhanced to limit the max: > >>>> > >>>> ib_device_attr::max_send_sge = min(fw-advertised-max, (max_sq_desc_space - > >>>> max_wqe_overhead) / sge_descriptor_size); > >>> > >>> Probably. I also see that the NFS server has no reason to > >>> request so many send SGEs. > >> > >> I've now tested a couple other cards/drivers. mlx4/CX-3 is the > >> only one with this problem I've found so far. > > > > Thanks Chuck and Steve, > > > > I checked it in our mlx4 HW spec and it seems that the check in > > set_kernel_sq_size() is the right one. We should take into account > > the following four maximum parameters: max_sg_rq, max_sg_sq, > > max_desc_sz_rq and max_desc_sz_sq. > > > > However instead of bringing this complexity to query_device, which still > > won't be sufficient anyway (the calculations are dependent on QP type), > > the safer approach will be to restore old code, which will give you 32 SGEs. > > I'm OK with the "restore legacy behavior" approach. I assume you > need to pass this through your testing regimen, so I will wait > for you to merge it. Thanks, It is part of our regression. I'll send formal patch on Sunday. > > I have a patch that limits the NFS server to just enough Send SGEs > rather than using the device maximum. I plan to submit that during > the v4.20 merge window, unless folks would like me to send it sooner. > > > > Thanks > > > > diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c > > index ca0f1ee26091..21e268247f4d 100644 > > --- a/drivers/infiniband/hw/mlx4/main.c > > +++ b/drivers/infiniband/hw/mlx4/main.c > > @@ -517,8 +517,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, > > props->page_size_cap = dev->dev->caps.page_size_cap; > > props->max_qp = dev->dev->quotas.qp; > > props->max_qp_wr = dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE; > > - props->max_send_sge = dev->dev->caps.max_sq_sg; > > - props->max_recv_sge = dev->dev->caps.max_rq_sg; > > + props->max_send_sge = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg); > > + props->max_recv_sge = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg); > > props->max_sge_rd = MLX4_MAX_SGE_RD; > > props->max_cq = dev->dev->quotas.cq; > > props->max_cqe = dev->dev->caps.max_cqes; > > > >> > >> > >> -- > >> Chuck Lever > > -- > Chuck Lever > > >
Attachment:
signature.asc
Description: PGP signature