Re: Are device drivers ready for max_send_sge ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Aug 30, 2018, at 3:16 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> 
> On Wed, Aug 29, 2018 at 04:41:42PM -0400, Chuck Lever wrote:
>> 
>> 
>>> On Aug 29, 2018, at 2:52 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>> 
>>> 
>>>> On Aug 29, 2018, at 2:17 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>> Sent: Wednesday, August 29, 2018 1:09 PM
>>>>> To: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>
>>>>> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>; Leon Romanovsky
>>>>> <leon@xxxxxxxxxx>
>>>>> Subject: Re: Are device drivers ready for max_send_sge ?
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Aug 29, 2018, at 1:50 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>
>>>>> wrote:
>>>>>> 
>>>>>> Hey Chuck,
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
>>>>>>> owner@xxxxxxxxxxxxxxx> On Behalf Of Chuck Lever
>>>>>>> Sent: Wednesday, August 29, 2018 11:49 AM
>>>>>>> To: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>
>>>>>>> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Steve Wise
>>>>>>> <swise@xxxxxxxxxxxxxxxxxxxxx>
>>>>>>> Subject: Are device drivers ready for max_send_sge ?
>>>>>>> 
>>>>>>> In v4.19-rc1, the NFS/RDMA server has stopped working
>>>>>>> for me.
>>>>>>> 
>>>>>>> The reason for this is that the mlx4 driver (with CX-3)
>>>>>>> reports 62 in
>>>>>>> 
>>>>>>> ib_device_attr::max_send_sge
>>>>>>> 
>>>>>>> But when the NFS server tries to create a QP with a
>>>>>>> qp_attr.cap.max_send_sge set to 62, rdma_create_qp
>>>>>>> fails with -EINVAL. The check that fails is in
>>>>>>> 
>>>>>>> drivers/infiniband/hw/mlx4/qp.c :: set_kernel_sq_size
>>>>>>> 
>>>>>>> It's comparing the passed-in max_send_sge against the min()
>>>>>>> of dev->dev->caps.max_sq_sg and dev->dev->caps.max_rq_sg,
>>>>>>> and obviously that fails because max_rq_sg is smaller than
>>>>>>> max_sq_sg.
>>>>>>> 
>>>>>>> set_rq_size() also has similar dependencies on max_sq_sg
>>>>>>> that may no longer be appropriate.
>>>>>>> 
>>>>>> 
>>>>>> These need fixing for sure.
>>>>>> 
>>>>>>> When I fix the first sanity check in set_kernel_sq_size
>>>>>>> to ignore max_rq_sg, the third check in set_kernel_sq_size
>>>>>>> fails. This is:
>>>>>>> 
>>>>>>> s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg),
>>>>>>>       cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) +
>>>>>>>       send_wqe_overhead(type, qp->flags);
>>>>>>> 
>>>>>>> if (s > dev->dev->caps.max_sq_desc_sz)
>>>>>>> 
>>>>>>> I don't know enough about this logic to suggest a fix.
>>>>>>> 
>>>>>> 
>>>>>> If you set ib_device_attr::max_send_sge to 61, does it then work?  Just
>>>>>> wonder if this is a 1-off error.
>>>>> 
>>>>> I made this change to reduce ib_device_attr::max_send_sge
>>>>> to 61:
>>>>> 
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c
>>>>> b/drivers/net/ethernet/mellanox/mlx4/fw.c
>>>>> index babcfd9..81874f6 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
>>>>> @@ -950,7 +950,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev,
>>>>> struct mlx4_dev_cap *dev_cap)
>>>>>      }
>>>>> 
>>>>>      MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_SG_SQ_OFFSET);
>>>>> -       dev_cap->max_sq_sg = field;
>>>>> +       dev_cap->max_sq_sg = field - 1;
>>>>>      MLX4_GET(size, outbox, QUERY_DEV_CAP_MAX_DESC_SZ_SQ_OFFSET);
>>>>>      dev_cap->max_sq_desc_sz = size;
>>>>> 
>>>>> 
>>>>> I added printk's to set_kernel_sq_size to capture the parameters
>>>>> of the failing sanity check.
>>>>> 
>>>>> Here's with max_send_sge == 62:
>>>>> 
>>>>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1056
>>>>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev-
>>>>>> caps.max_sq_desc_sz = 1008
>>>>> 
>>>>> and with max_send_sge == 61:
>>>>> 
>>>>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1040
>>>>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev-
>>>>>> caps.max_sq_desc_sz = 1008
>>>>> 
>>>> 
>>>> I guess mlx4_ib_query_device() could be enhanced to limit the max:
>>>> 
>>>> ib_device_attr::max_send_sge = min(fw-advertised-max, (max_sq_desc_space -
>>>> max_wqe_overhead) / sge_descriptor_size);
>>> 
>>> Probably. I also see that the NFS server has no reason to
>>> request so many send SGEs.
>> 
>> I've now tested a couple other cards/drivers. mlx4/CX-3 is the
>> only one with this problem I've found so far.
> 
> Thanks Chuck and Steve,
> 
> I checked it in our mlx4 HW spec and it seems that the check in
> set_kernel_sq_size() is the right one. We should take into account
> the following four maximum parameters: max_sg_rq, max_sg_sq,
> max_desc_sz_rq and max_desc_sz_sq.
> 
> However instead of bringing this complexity to query_device, which still
> won't be sufficient anyway (the calculations are dependent on QP type),
> the safer approach will be to restore old code, which will give you 32 SGEs.

I'm OK with the "restore legacy behavior" approach. I assume you
need to pass this through your testing regimen, so I will wait
for you to merge it.

I have a patch that limits the NFS server to just enough Send SGEs
rather than using the device maximum. I plan to submit that during
the v4.20 merge window, unless folks would like me to send it sooner.


> Thanks
> 
> diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
> index ca0f1ee26091..21e268247f4d 100644
> --- a/drivers/infiniband/hw/mlx4/main.c
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -517,8 +517,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
> 	props->page_size_cap	   = dev->dev->caps.page_size_cap;
> 	props->max_qp		   = dev->dev->quotas.qp;
> 	props->max_qp_wr	   = dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE;
> -	props->max_send_sge	   = dev->dev->caps.max_sq_sg;
> -	props->max_recv_sge	   = dev->dev->caps.max_rq_sg;
> +	props->max_send_sge	   = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg);
> +	props->max_recv_sge	   = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg);
> 	props->max_sge_rd	   = MLX4_MAX_SGE_RD;
> 	props->max_cq		   = dev->dev->quotas.cq;
> 	props->max_cqe		   = dev->dev->caps.max_cqes;
> 
>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever







[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux