On Wed, Aug 29, 2018 at 04:41:42PM -0400, Chuck Lever wrote: > > > > On Aug 29, 2018, at 2:52 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > > > > > >> On Aug 29, 2018, at 2:17 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: > >> > >> > >> > >>> -----Original Message----- > >>> From: Chuck Lever <chuck.lever@xxxxxxxxxx> > >>> Sent: Wednesday, August 29, 2018 1:09 PM > >>> To: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> > >>> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>; Leon Romanovsky > >>> <leon@xxxxxxxxxx> > >>> Subject: Re: Are device drivers ready for max_send_sge ? > >>> > >>> > >>> > >>>> On Aug 29, 2018, at 1:50 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> > >>> wrote: > >>>> > >>>> Hey Chuck, > >>>> > >>>>> -----Original Message----- > >>>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > >>>>> owner@xxxxxxxxxxxxxxx> On Behalf Of Chuck Lever > >>>>> Sent: Wednesday, August 29, 2018 11:49 AM > >>>>> To: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx> > >>>>> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Steve Wise > >>>>> <swise@xxxxxxxxxxxxxxxxxxxxx> > >>>>> Subject: Are device drivers ready for max_send_sge ? > >>>>> > >>>>> In v4.19-rc1, the NFS/RDMA server has stopped working > >>>>> for me. > >>>>> > >>>>> The reason for this is that the mlx4 driver (with CX-3) > >>>>> reports 62 in > >>>>> > >>>>> ib_device_attr::max_send_sge > >>>>> > >>>>> But when the NFS server tries to create a QP with a > >>>>> qp_attr.cap.max_send_sge set to 62, rdma_create_qp > >>>>> fails with -EINVAL. The check that fails is in > >>>>> > >>>>> drivers/infiniband/hw/mlx4/qp.c :: set_kernel_sq_size > >>>>> > >>>>> It's comparing the passed-in max_send_sge against the min() > >>>>> of dev->dev->caps.max_sq_sg and dev->dev->caps.max_rq_sg, > >>>>> and obviously that fails because max_rq_sg is smaller than > >>>>> max_sq_sg. > >>>>> > >>>>> set_rq_size() also has similar dependencies on max_sq_sg > >>>>> that may no longer be appropriate. > >>>>> > >>>> > >>>> These need fixing for sure. > >>>> > >>>>> When I fix the first sanity check in set_kernel_sq_size > >>>>> to ignore max_rq_sg, the third check in set_kernel_sq_size > >>>>> fails. This is: > >>>>> > >>>>> s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), > >>>>> cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + > >>>>> send_wqe_overhead(type, qp->flags); > >>>>> > >>>>> if (s > dev->dev->caps.max_sq_desc_sz) > >>>>> > >>>>> I don't know enough about this logic to suggest a fix. > >>>>> > >>>> > >>>> If you set ib_device_attr::max_send_sge to 61, does it then work? Just > >>>> wonder if this is a 1-off error. > >>> > >>> I made this change to reduce ib_device_attr::max_send_sge > >>> to 61: > >>> > >>> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c > >>> b/drivers/net/ethernet/mellanox/mlx4/fw.c > >>> index babcfd9..81874f6 100644 > >>> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c > >>> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c > >>> @@ -950,7 +950,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, > >>> struct mlx4_dev_cap *dev_cap) > >>> } > >>> > >>> MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_SG_SQ_OFFSET); > >>> - dev_cap->max_sq_sg = field; > >>> + dev_cap->max_sq_sg = field - 1; > >>> MLX4_GET(size, outbox, QUERY_DEV_CAP_MAX_DESC_SZ_SQ_OFFSET); > >>> dev_cap->max_sq_desc_sz = size; > >>> > >>> > >>> I added printk's to set_kernel_sq_size to capture the parameters > >>> of the failing sanity check. > >>> > >>> Here's with max_send_sge == 62: > >>> > >>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1056 > >>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev- > >>>> caps.max_sq_desc_sz = 1008 > >>> > >>> and with max_send_sge == 61: > >>> > >>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1040 > >>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev- > >>>> caps.max_sq_desc_sz = 1008 > >>> > >> > >> I guess mlx4_ib_query_device() could be enhanced to limit the max: > >> > >> ib_device_attr::max_send_sge = min(fw-advertised-max, (max_sq_desc_space - > >> max_wqe_overhead) / sge_descriptor_size); > > > > Probably. I also see that the NFS server has no reason to > > request so many send SGEs. > > I've now tested a couple other cards/drivers. mlx4/CX-3 is the > only one with this problem I've found so far. Thanks Chuck and Steve, I checked it in our mlx4 HW spec and it seems that the check in set_kernel_sq_size() is the right one. We should take into account the following four maximum parameters: max_sg_rq, max_sg_sq, max_desc_sz_rq and max_desc_sz_sq. However instead of bringing this complexity to query_device, which still won't be sufficient anyway (the calculations are dependent on QP type), the safer approach will be to restore old code, which will give you 32 SGEs. Thanks diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index ca0f1ee26091..21e268247f4d 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -517,8 +517,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->page_size_cap = dev->dev->caps.page_size_cap; props->max_qp = dev->dev->quotas.qp; props->max_qp_wr = dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE; - props->max_send_sge = dev->dev->caps.max_sq_sg; - props->max_recv_sge = dev->dev->caps.max_rq_sg; + props->max_send_sge = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg); + props->max_recv_sge = min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg); props->max_sge_rd = MLX4_MAX_SGE_RD; props->max_cq = dev->dev->quotas.cq; props->max_cqe = dev->dev->caps.max_cqes; > > > -- > Chuck Lever > > >
Attachment:
signature.asc
Description: PGP signature