Hi Steve- > On Jan 11, 2017, at 12:04 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: > > Hey Chuck, > >>> >>> Browsing the code of other drivers it can be seen that this ability is > either >> hardcoded or is >>> learnt by the driver from the device. >> >> In the latter case, there's no way for me to know what that >> capability is by looking at kernel code. There's also no way >> for me to know about out-of-tree drivers or pre-release devices. > > But shouldn't NFS always limit its sge depths based on ib_device_attr->max_sge? > I don't think it is reasonable to assume any minimum value supported by all > devices... The previous "minimum" requirement assumed 2 SGEs. RPC-over-RDMA just doesn't work with fewer than that. So you would rather have RPC-over-RDMA automatically reduce the inline threshold setting for each connection if the current system setting cannot be supported by the device? I'll consider that. Keeping logic in the ULPs to handle devices with tiny capabilities duplicates a lot of complexity and impedes the introduction of new features like larger inline thresholds. The one abstract feature ULPs might really want is the ability to send medium-sized data payloads in place. More than a handful of SGEs is needed for that capability (in the kernel, where such payloads are typically in the page cache). It might be cool to have an API similar to rdma_rw that allows ULPs to use a scatterlist for Send and Receive operations. It could hide the driver and device maximum SGE values. >> It's not feasible for me to stock my lab with more than a >> couple of devices anyway. >> >> For all these reasons, I rely on HCA vendors for smoke testing >> NFS/RDMA with their devices. >> >> [1] was posted for review on public mailing lists for weeks. I >> received no review comments or reports of testing successes or >> failures from any vendor, until Broadcom's report in late >> December, three months after [1] appeared in a kernel release >> candidate. >> >> This may sound like sour grapes, but this is a review and >> testing gap, and I think the community should have the ability >> to address it. >> >> HCA vendors, especially, have to focus on kernel release >> candidate testing if functional ULPs are a critical release >> criterion for them. >> > > You're absolutely right. I'm querying Chelsio to see how this might have > slipped through the cracks. Did this initial change land in linux-4.9? I believe so. > I have one nit though, your patch series are always very long and thus, to me, > tedious to review. It would be nice to see 5-8 patches submitted for review vs > 15+. I cap my patch series around 20 for just this reason. That seemed to be the average number being posted for other ULPs. The flip side is that sometimes it takes several quarters to get a full set of changes upstream. Splitting features across kernel releases means the feature can't be reviewed together, and is sometimes more difficult for distribution backports. Could also go with more smaller patches, where each patch is easier to review, or capping at 8 patches but each patch is more complex. It's also OK to suggest series reorganization whenever you feel the ennui. ;-) >>> If I'm not mistaken, this issue affects nes and >>> cxgb3/4 drivers, and perhaps others. >> >> ocrdma and Oracle's HCA. >> >> >>> E.g., for cxgb4: >>> >>> #define T4_MAX_RECV_SGE 4 >> >> Yet, without hard-coded max_sge values in kernel drivers, it's >> difficult to say whether 4 is truly the lower bound. >> >> >>> static int c4iw_query_device(struct ib_device *ibdev, struct > ib_device_attr >> *props, >>> struct ib_udata *uhw) >>> { >>> ... >>> props->max_sge = T4_MAX_RECV_SGE; >>> >>> *** > > FYI: cxgb4 supports 4 max for recv wrs, and 17 max for send wrs. Perhaps 17 > avoided any problems for cxgb4 with the original code? The original code needed only two SGEs for sending, and one for receiving. IIRC the RPC-over-RDMA receive path still needs just one SGE. > Note: the ib_device_attr only has a max_sge that pertains to both send and recv, > so cxgb4 sets it to the min value. We should probably add a max_recv_sge and > max_send_sge to ib_device_attr... I could go for that too. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html