Hey Chuck, > -----Original Message----- > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > owner@xxxxxxxxxxxxxxx> On Behalf Of Chuck Lever > Sent: Wednesday, August 29, 2018 11:49 AM > To: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx> > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Steve Wise > <swise@xxxxxxxxxxxxxxxxxxxxx> > Subject: Are device drivers ready for max_send_sge ? > > In v4.19-rc1, the NFS/RDMA server has stopped working > for me. > > The reason for this is that the mlx4 driver (with CX-3) > reports 62 in > > ib_device_attr::max_send_sge > > But when the NFS server tries to create a QP with a > qp_attr.cap.max_send_sge set to 62, rdma_create_qp > fails with -EINVAL. The check that fails is in > > drivers/infiniband/hw/mlx4/qp.c :: set_kernel_sq_size > > It's comparing the passed-in max_send_sge against the min() > of dev->dev->caps.max_sq_sg and dev->dev->caps.max_rq_sg, > and obviously that fails because max_rq_sg is smaller than > max_sq_sg. > > set_rq_size() also has similar dependencies on max_sq_sg > that may no longer be appropriate. > These need fixing for sure. > When I fix the first sanity check in set_kernel_sq_size > to ignore max_rq_sg, the third check in set_kernel_sq_size > fails. This is: > > s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), > cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + > send_wqe_overhead(type, qp->flags); > > if (s > dev->dev->caps.max_sq_desc_sz) > > I don't know enough about this logic to suggest a fix. > If you set ib_device_attr::max_send_sge to 61, does it then work? Just wonder if this is a 1-off error. > Is there a driver-level fix in the works, or should I > consider changing the NFS server to compute a smaller > qp_attr.cap.max_send_sge ? > > -- > Chuck Lever >