> On Aug 29, 2018, at 1:50 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: > > Hey Chuck, > >> -----Original Message----- >> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- >> owner@xxxxxxxxxxxxxxx> On Behalf Of Chuck Lever >> Sent: Wednesday, August 29, 2018 11:49 AM >> To: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx> >> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Steve Wise >> <swise@xxxxxxxxxxxxxxxxxxxxx> >> Subject: Are device drivers ready for max_send_sge ? >> >> In v4.19-rc1, the NFS/RDMA server has stopped working >> for me. >> >> The reason for this is that the mlx4 driver (with CX-3) >> reports 62 in >> >> ib_device_attr::max_send_sge >> >> But when the NFS server tries to create a QP with a >> qp_attr.cap.max_send_sge set to 62, rdma_create_qp >> fails with -EINVAL. The check that fails is in >> >> drivers/infiniband/hw/mlx4/qp.c :: set_kernel_sq_size >> >> It's comparing the passed-in max_send_sge against the min() >> of dev->dev->caps.max_sq_sg and dev->dev->caps.max_rq_sg, >> and obviously that fails because max_rq_sg is smaller than >> max_sq_sg. >> >> set_rq_size() also has similar dependencies on max_sq_sg >> that may no longer be appropriate. >> > > These need fixing for sure. > >> When I fix the first sanity check in set_kernel_sq_size >> to ignore max_rq_sg, the third check in set_kernel_sq_size >> fails. This is: >> >> s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), >> cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + >> send_wqe_overhead(type, qp->flags); >> >> if (s > dev->dev->caps.max_sq_desc_sz) >> >> I don't know enough about this logic to suggest a fix. >> > > If you set ib_device_attr::max_send_sge to 61, does it then work? Just > wonder if this is a 1-off error. I made this change to reduce ib_device_attr::max_send_sge to 61: diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c index babcfd9..81874f6 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -950,7 +950,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) } MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_SG_SQ_OFFSET); - dev_cap->max_sq_sg = field; + dev_cap->max_sq_sg = field - 1; MLX4_GET(size, outbox, QUERY_DEV_CAP_MAX_DESC_SZ_SQ_OFFSET); dev_cap->max_sq_desc_sz = size; I added printk's to set_kernel_sq_size to capture the parameters of the failing sanity check. Here's with max_send_sge == 62: Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1056 Aug 29 12:26:01 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev->caps.max_sq_desc_sz = 1008 and with max_send_sge == 61: Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: s = 1040 Aug 29 13:59:40 klimt kernel: <mlx4_ib> set_kernel_sq_size: dev->dev->caps.max_sq_desc_sz = 1008 >> Is there a driver-level fix in the works, or should I >> consider changing the NFS server to compute a smaller >> qp_attr.cap.max_send_sge ? >> >> -- >> Chuck Lever -- Chuck Lever