> Hi Steve- > > > > On Jan 11, 2017, at 12:04 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> > wrote: > > > > Hey Chuck, > > > >>> > >>> Browsing the code of other drivers it can be seen that this ability is > > either > >> hardcoded or is > >>> learnt by the driver from the device. > >> > >> In the latter case, there's no way for me to know what that > >> capability is by looking at kernel code. There's also no way > >> for me to know about out-of-tree drivers or pre-release devices. > > > > But shouldn't NFS always limit its sge depths based on ib_device_attr->max_sge? > > I don't think it is reasonable to assume any minimum value supported by all > > devices... > > The previous "minimum" requirement assumed 2 SGEs. RPC-over-RDMA just > doesn't work with fewer than that. > > So you would rather have RPC-over-RDMA automatically reduce the inline > threshold setting for each connection if the current system setting > cannot be supported by the device? I'll consider that. > That seems reasonable. > Keeping logic in the ULPs to handle devices with tiny capabilities > duplicates a lot of complexity and impedes the introduction of new > features like larger inline thresholds. > > The one abstract feature ULPs might really want is the ability to > send medium-sized data payloads in place. More than a handful of > SGEs is needed for that capability (in the kernel, where such payloads > are typically in the page cache). > > It might be cool to have an API similar to rdma_rw that allows ULPs > to use a scatterlist for Send and Receive operations. It could hide > the driver and device maximum SGE values. > I'm not sure what you mean by "in place"? (sorry for being not up to speed on this whole issue) But perhaps some API like this could be added to rdma_rw... > > >> It's not feasible for me to stock my lab with more than a > >> couple of devices anyway. > >> > >> For all these reasons, I rely on HCA vendors for smoke testing > >> NFS/RDMA with their devices. > >> > >> [1] was posted for review on public mailing lists for weeks. I > >> received no review comments or reports of testing successes or > >> failures from any vendor, until Broadcom's report in late > >> December, three months after [1] appeared in a kernel release > >> candidate. > >> > >> This may sound like sour grapes, but this is a review and > >> testing gap, and I think the community should have the ability > >> to address it. > >> > >> HCA vendors, especially, have to focus on kernel release > >> candidate testing if functional ULPs are a critical release > >> criterion for them. > >> > > > > You're absolutely right. I'm querying Chelsio to see how this might have > > slipped through the cracks. Did this initial change land in linux-4.9? > > I believe so. > > > > I have one nit though, your patch series are always very long and thus, to me, > > tedious to review. It would be nice to see 5-8 patches submitted for review vs > > 15+. > > I cap my patch series around 20 for just this reason. That > seemed to be the average number being posted for other ULPs. > > The flip side is that sometimes it takes several quarters to > get a full set of changes upstream. Splitting features across > kernel releases means the feature can't be reviewed together, > and is sometimes more difficult for distribution backports. > > Could also go with more smaller patches, where each patch is > easier to review, or capping at 8 patches but each patch > is more complex. > > It's also OK to suggest series reorganization whenever you > feel the ennui. ;-) > > > >>> If I'm not mistaken, this issue affects nes and > >>> cxgb3/4 drivers, and perhaps others. > >> > >> ocrdma and Oracle's HCA. > >> > >> > >>> E.g., for cxgb4: > >>> > >>> #define T4_MAX_RECV_SGE 4 > >> > >> Yet, without hard-coded max_sge values in kernel drivers, it's > >> difficult to say whether 4 is truly the lower bound. > >> > >> > >>> static int c4iw_query_device(struct ib_device *ibdev, struct > > ib_device_attr > >> *props, > >>> struct ib_udata *uhw) > >>> { > >>> ... > >>> props->max_sge = T4_MAX_RECV_SGE; > >>> > >>> *** > > > > FYI: cxgb4 supports 4 max for recv wrs, and 17 max for send wrs. Perhaps 17 > > avoided any problems for cxgb4 with the original code? > > The original code needed only two SGEs for sending, and one for > receiving. > > IIRC the RPC-over-RDMA receive path still needs just one SGE. > No I mean the code that bumps it up to 18. Would that cause an immediate failure if cxgb4 supported 17 and only enforces it at post_send() time? (haven't looked in detail at your patches...sorry). Our QA ran testing on 4.9 and didn't see this issue, so that's why I'm asking. They have not yet run NFS/RDMA testing on 4.9-rc. I've asked them to do a quick regression test asap. > > > Note: the ib_device_attr only has a max_sge that pertains to both send and recv, > > so cxgb4 sets it to the min value. We should probably add a max_recv_sge and > > max_send_sge to ib_device_attr... > > I could go for that too. > I'm swamped right now to add this, but the changes should be trivial... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html