> On Aug 29, 2018, at 2:52 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > >> On Aug 29, 2018, at 2:17 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: >> >> >> >>> -----Original Message----- >>> From: Chuck Lever <chuck.lever@xxxxxxxxxx> >>> Sent: Wednesday, August 29, 2018 1:09 PM >>> To: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> >>> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>; Leon Romanovsky >>> <leon@xxxxxxxxxx> >>> Subject: Re: Are device drivers ready for max_send_sge ? >>> >>> >>> >>>> On Aug 29, 2018, at 1:50 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> >>> wrote: >>>> >>>> Hey Chuck, >>>> >>>>> -----Original Message----- >>>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- >>>>> owner@xxxxxxxxxxxxxxx> On Behalf Of Chuck Lever >>>>> Sent: Wednesday, August 29, 2018 11:49 AM >>>>> To: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx> >>>>> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Steve Wise >>>>> <swise@xxxxxxxxxxxxxxxxxxxxx> >>>>> Subject: Are device drivers ready for max_send_sge ? >>>>> >>>>> In v4.19-rc1, the NFS/RDMA server has stopped working >>>>> for me. >>>>> >>>>> The reason for this is that the mlx4 driver (with CX-3) >>>>> reports 62 in >>>>> >>>>> ib_device_attr::max_send_sge >>>>> >>>>> But when the NFS server tries to create a QP with a >>>>> qp_attr.cap.max_send_sge set to 62, rdma_create_qp >>>>> fails with -EINVAL. The check that fails is in >>>>> >>>>> drivers/infiniband/hw/mlx4/qp.c :: set_kernel_sq_size >>>>> >>>>> It's comparing the passed-in max_send_sge against the min() >>>>> of dev->dev->caps.max_sq_sg and dev->dev->caps.max_rq_sg, >>>>> and obviously that fails because max_rq_sg is smaller than >>>>> max_sq_sg. >>>>> >>>>> set_rq_size() also has similar dependencies on max_sq_sg >>>>> that may no longer be appropriate. >>>>> >>>> >>>> These need fixing for sure. >>>> >>>>> When I fix the first sanity check in set_kernel_sq_size >>>>> to ignore max_rq_sg, the third check in set_kernel_sq_size >>>>> fails. This is: >>>>> >>>>> s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), >>>>> cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + >>>>> send_wqe_overhead(type, qp->flags); >>>>> >>>>> if (s > dev->dev->caps.max_sq_desc_sz) >>>>> >>>>> I don't know enough about this logic to suggest a fix. >>>>> >>>> >>>> If you set ib_device_attr::max_send_sge to 61, does it then work? Just >>>> wonder if this is a 1-off error. >>> >>> I made this change to reduce ib_device_attr::max_send_sge >>> to 61: >>> >>> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c >>> b/drivers/net/ethernet/mellanox/mlx4/fw.c >>> index babcfd9..81874f6 100644 >>> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c >>> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c >>> @@ -950,7 +950,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, >>> struct mlx4_dev_cap *dev_cap) >>> } >>> >>> MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_SG_SQ_OFFSET); >>> - dev_cap->max_sq_sg = field; >>> + dev_cap->max_sq_sg = field - 1; >>> MLX4_GET(size, outbox, QUERY_DEV_CAP_MAX_DESC_SZ_SQ_OFFSET); >>> dev_cap->max_sq_desc_sz = size; >>> >>> >>> I added printk's to set_kernel_sq_size to capture the parameters >>> of the failing sanity check. >>> >>> Here's with max_send_sge == 62: >>> >>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1056 >>> Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev- >>>> caps.max_sq_desc_sz = 1008 >>> >>> and with max_send_sge == 61: >>> >>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1040 >>> Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev- >>>> caps.max_sq_desc_sz = 1008 >>> >> >> I guess mlx4_ib_query_device() could be enhanced to limit the max: >> >> ib_device_attr::max_send_sge = min(fw-advertised-max, (max_sq_desc_space - >> max_wqe_overhead) / sge_descriptor_size); > > Probably. I also see that the NFS server has no reason to > request so many send SGEs. I've now tested a couple other cards/drivers. mlx4/CX-3 is the only one with this problem I've found so far. -- Chuck Lever