Re: NFSoRDMA Fails for max_sge Less Than 18

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Steve-


> On Jan 11, 2017, at 12:04 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> Hey Chuck,
> 
>>> 
>>> Browsing the code of other drivers it can be seen that this ability is
> either
>> hardcoded or is
>>> learnt by the driver from the device.
>> 
>> In the latter case, there's no way for me to know what that
>> capability is by looking at kernel code. There's also no way
>> for me to know about out-of-tree drivers or pre-release devices.
> 
> But shouldn't NFS always limit its sge depths based on ib_device_attr->max_sge?
> I don't think it is reasonable to assume any minimum value supported by all
> devices...

The previous "minimum" requirement assumed 2 SGEs. RPC-over-RDMA just
doesn't work with fewer than that.

So you would rather have RPC-over-RDMA automatically reduce the inline
threshold setting for each connection if the current system setting
cannot be supported by the device? I'll consider that.

Keeping logic in the ULPs to handle devices with tiny capabilities
duplicates a lot of complexity and impedes the introduction of new
features like larger inline thresholds.

The one abstract feature ULPs might really want is the ability to
send medium-sized data payloads in place. More than a handful of
SGEs is needed for that capability (in the kernel, where such payloads
are typically in the page cache).

It might be cool to have an API similar to rdma_rw that allows ULPs
to use a scatterlist for Send and Receive operations. It could hide
the driver and device maximum SGE values.


>> It's not feasible for me to stock my lab with more than a
>> couple of devices anyway.
>> 
>> For all these reasons, I rely on HCA vendors for smoke testing
>> NFS/RDMA with their devices.
>> 
>> [1] was posted for review on public mailing lists for weeks. I
>> received no review comments or reports of testing successes or
>> failures from any vendor, until Broadcom's report in late
>> December, three months after [1] appeared in a kernel release
>> candidate.
>> 
>> This may sound like sour grapes, but this is a review and
>> testing gap, and I think the community should have the ability
>> to address it.
>> 
>> HCA vendors, especially, have to focus on kernel release
>> candidate testing if functional ULPs are a critical release
>> criterion for them.
>> 
> 
> You're absolutely right.  I'm querying Chelsio to see how this might have
> slipped through the cracks.  Did this initial change land in linux-4.9?

I believe so.


> I have one nit though, your patch series are always very long and thus, to me,
> tedious to review.  It would be nice to see 5-8 patches submitted for review vs
> 15+.

I cap my patch series around 20 for just this reason. That
seemed to be the average number being posted for other ULPs.

The flip side is that sometimes it takes several quarters to
get a full set of changes upstream. Splitting features across
kernel releases means the feature can't be reviewed together,
and is sometimes more difficult for distribution backports.

Could also go with more smaller patches, where each patch is
easier to review, or capping at 8 patches but each patch
is more complex.

It's also OK to suggest series reorganization whenever you
feel the ennui. ;-)


>>> If I'm not mistaken, this issue affects nes and
>>> cxgb3/4 drivers, and perhaps others.
>> 
>> ocrdma and Oracle's HCA.
>> 
>> 
>>> E.g., for cxgb4:
>>> 
>>>       #define T4_MAX_RECV_SGE 4
>> 
>> Yet, without hard-coded max_sge values in kernel drivers, it's
>> difficult to say whether 4 is truly the lower bound.
>> 
>> 
>>>       static int c4iw_query_device(struct ib_device *ibdev, struct
> ib_device_attr
>> *props,
>>>                                    struct ib_udata *uhw)
>>>       {
>>>               ...
>>>               props->max_sge = T4_MAX_RECV_SGE;
>>> 
>>> ***
> 
> FYI:  cxgb4 supports 4 max for recv wrs, and 17 max for send wrs.  Perhaps 17
> avoided any problems for cxgb4 with the original code?

The original code needed only two SGEs for sending, and one for
receiving.

IIRC the RPC-over-RDMA receive path still needs just one SGE.


> Note: the ib_device_attr only has a max_sge that pertains to both send and recv,
> so cxgb4 sets it to the min value.  We should probably add a max_recv_sge and
> max_send_sge to ib_device_attr...

I could go for that too.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux