> On Aug 30, 2018, at 3:16 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > On Wed, Aug 29, 2018 at 04:41:42PM -0400, Chuck Lever wrote: >> >> >>> On Aug 29, 2018, at 2:52 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >>> >>> >>> >>>> On Aug 29, 2018, at 2:17 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>> Sent: Wednesday, August 29, 2018 1:09 PM >>>>> To: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> >>>>> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>; Leon Romanovsky >>>>> <leon@xxxxxxxxxx> >>>>> Subject: Re: Are device drivers ready for max_send_sge ? >>>>> >>>>> >>>>> >>>>>> On Aug 29, 2018, at 1:50 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> Hey Chuck, >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- >>>>>>> owner@xxxxxxxxxxxxxxx> On Behalf Of Chuck Lever >>>>>>> Sent: Wednesday, August 29, 2018 11:49 AM >>>>>>> To: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx> >>>>>>> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Steve Wise >>>>>>> <swise@xxxxxxxxxxxxxxxxxxxxx> >>>>>>> Subject: Are device drivers ready for max_send_sge ? >>>>>>> >>>>>>> In v4.19-rc1, the NFS/RDMA server has stopped working >>>>>>> for me. >>>>>>> >>>>>>> The reason for this is that the mlx4 driver (with CX-3) >>>>>>> reports 62 in >>>>>>> >>>>>>> ib_device_attr::max_send_sge >>>>>>> >>>>>>> But when the NFS server tries to create a QP with a >>>>>>> qp_attr.cap.max_send_sge set to 62, rdma_create_qp >>>>>>> fails with -EINVAL. The check that fails is in >>>>>>> >>>>>>> drivers/infiniband/hw/mlx4/qp.c :: set_kernel_sq_size >>>>>>> >>>>>>> It's comparing the passed-in max_send_sge against the min() >>>>>>> of dev->dev->caps.max_sq_sg and dev->dev->caps.max_rq_sg, >>>>>>> and obviously that fails because max_rq_sg is smaller than >>>>>>> max_sq_sg. >>>>>>> >>>>>>> set_rq_size() also has similar dependencies on max_sq_sg >>>>>>> that may no longer be appropriate. >>>>>>> >>>>>> >>>>>> These need fixing for sure. >>>>>> >>>>>>> When I fix the first sanity check in set_kernel_sq_size >>>>>>> to ignore max_rq_sg, the third check in set_kernel_sq_size >>>>>>> fails. This is: >>>>>>> >>>>>>> s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), >>>>>>> cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + >>>>>>> send_wqe_overhead(type, qp->flags); >>>>>>> >>>>>>> if (s > dev->dev->caps.max_sq_desc_sz) >>>>>>> >>>>>>> I don't know enough about this logic to suggest a fix. >>>>>>> >>>>>> >>>>>> If you set ib_device_attr::max_send_sge to 61, does it then work? Just >>>>>> wonder if this is a 1-off error. >>>>> >>>>> I made this change to reduce ib_device_attr::max_send_sge >>>>> to 61: >>>>> >>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c >>>>> b/drivers/net/ethernet/mellanox/mlx4/fw.c >>>>> index babcfd9..81874f6 100644 >>>>> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c >>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c >>>>> @@ -950,7 +950,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, >>>>> struct mlx4_dev_cap *dev_cap) >>>>> } >>>>> >>>>> MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_SG_SQ_OFFSET); >>>>> - dev_cap->max_sq_sg = field; >>>>> + dev_cap->max_sq_sg = field - 1; >>>>> MLX4_GET(size, outbox, QUERY_DEV_CAP_MAX_DESC_SZ_SQ_OFFSET); >>>>> dev_cap->max_sq_desc_sz = size; >>>>> >>>>> >>>>> I added printk's to set_kernel_sq_size to capture the parameters >>>>> of the failing sanity check. >>>>> >>>>> Here's with max_send_sge == 62: >>>>> >>>>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1056 >>>>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev- >>>>>> caps.max_sq_desc_sz = 1008 >>>>> >>>>> and with max_send_sge == 61: >>>>> >>>>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1040 >>>>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev- >>>>>> caps.max_sq_desc_sz = 1008 >>>>> >>>> >>>> I guess mlx4_ib_query_device() could be enhanced to limit the max: >>>> >>>> ib_device_attr::max_send_sge = min(fw-advertised-max, (max_sq_desc_space - >>>> max_wqe_overhead) / sge_descriptor_size); >>> >>> Probably. I also see that the NFS server has no reason to >>> request so many send SGEs. >> >> I've now tested a couple other cards/drivers. mlx4/CX-3 is the >> only one with this problem I've found so far. > > Thanks Chuck and Steve, > > I checked it in our mlx4 HW spec and it seems that the check in > set_kernel_sq_size() is the right one. We should take into account > the following four maximum parameters: max_sg_rq, max_sg_sq, > max_desc_sz_rq and max_desc_sz_sq. > > However instead of bringing this complexity to query_device, which still > won't be sufficient anyway (the calculations are dependent on QP type), > the safer approach will be to restore old code, which will give you 32 SGEs. I'm OK with the "restore legacy behavior" approach. I assume you need to pass this through your testing regimen, so I will wait for you to merge it. I have a patch that limits the NFS server to just enough Send SGEs rather than using the device maximum. I plan to submit that during the v4.20 merge window, unless folks would like me to send it sooner. > Thanks > > diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c > index ca0f1ee26091..21e268247f4d 100644 > --- a/drivers/infiniband/hw/mlx4/main.c > +++ b/drivers/infiniband/hw/mlx4/main.c > @@ -517,8 +517,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, > props->page_size_cap = dev->dev->caps.page_size_cap; > props->max_qp = dev->dev->quotas.qp; > props->max_qp_wr = dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE; > - props->max_send_sge = dev->dev->caps.max_sq_sg; > - props->max_recv_sge = dev->dev->caps.max_rq_sg; > + props->max_send_sge = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg); > + props->max_recv_sge = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg); > props->max_sge_rd = MLX4_MAX_SGE_RD; > props->max_cq = dev->dev->quotas.cq; > props->max_cqe = dev->dev->caps.max_cqes; > >> >> >> -- >> Chuck Lever -- Chuck Lever