RE: [RFC PATCH] RDMA/siw: Experimental e2e negotiation of GSO usage.

"Bernard Metzler" <BMT@xxxxxxxxxxxxxx> · Wed, 15 Apr 2020 11:59:21 +0000

-----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: -----

>To: "Bernard Metzler" <bmt@xxxxxxxxxxxxxx>, <faisal.latif@xxxxxxxxx>,
><shiraz.saleem@xxxxxxxxx>, <mkalderon@xxxxxxxxxxx>,
><aelior@xxxxxxxxxxx>
>From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
>Date: 04/15/2020 12:52PM
>Cc: dledford@xxxxxxxxxx, jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx,
><bharat@xxxxxxxxxxx>, <nirranjan@xxxxxxxxxxx>
>Subject: [EXTERNAL] Re: [RFC PATCH] RDMA/siw: Experimental e2e
>negotiation of GSO usage.
>
>On Tuesday, April 04/14/20, 2020 at 16:48:22 +0200, Bernard Metzler
>wrote:
>> Disabling GS0 usage lets siw create FPDUs fitting MTU size.
>> Enabling GSO usage lets siw form larger FPDUs fitting up to one
>> current GSO frame. As a software only iWarp implementation, for
>> large messages, siw bandwidth performance severly suffers from not
>> using GSO, reducing available single stream bandwidth on fast links
>> by more than 50%, while increasing CPU load.
>> 
>> Experimental GSO usage handshake is implemented by using one spare
>> bit of the MPA header, which is used to signal GSO framing at
>> initiator side and GSO framing acceptance at responder side.
>> Typical iWarp hardware implementations will not set or interpret
>> that header bit. Against such peer, siw will adhere to forming
>> FPDUs fitting with MTU size. This assures interoperability with
>> peer iWarp implementations unable to process FPDUs larger than
>> MTU size.
>> 
>> Signed-off-by: Bernard Metzler <bmt@xxxxxxxxxxxxxx>
>> ---
>>  drivers/infiniband/sw/siw/siw_main.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/infiniband/sw/siw/siw_main.c
>b/drivers/infiniband/sw/siw/siw_main.c
>> index 5cd40fb9e20c..a2dbdbcacf72 100644
>> --- a/drivers/infiniband/sw/siw/siw_main.c
>> +++ b/drivers/infiniband/sw/siw/siw_main.c
>> @@ -36,7 +36,7 @@ const bool zcopy_tx = true;
>>   * large packets. try_gso = true lets siw try to use local GSO,
>>   * if peer agrees.  Not using GSO severly limits siw maximum tx
>bandwidth.
>>   */
>> -const bool try_gso;
>> +const bool try_gso = true;
>>  
>>  /* Attach siw also with loopback devices */
>>  const bool loopback_enabled = true;
>> -- 
>> 2.20.1
>> 
>
>Hi Bernard,
>
>As per RFC5044, DDP layer should limit each record size to 
>MULPDU(Maximum ULPDU: The current maximum size of the record that
>is acceptable for DDP to pass to MPA for transmission)" 
>Eg: if physical layer MTU is 1500, then DDP record length should
>be ~1448 Max. All hard iWARP devices defaults to this behaviour, I
>think.
>		   
>So if SoftiWARP constructs FPDU based on this 64KB MSS, then the 

Hi Krishna,

The proposed patch does not nail it to 64K, but dynamically
adapts to what the TCP socket currently advertises.
This may vary over time, at maximum to 64K, or stick to 1.5K,
if advertised so. siw is not an ULP of Ethernet, but TCP, and
takes into account what the current TCP socket says.
TCP advertises to ULPs current maximum segments via
tcp_sock *tp->gso_segs for good reasons.

siw performance sucks when using small FPDUs, since it always
must attach a trailing CRC after any data pages it pushes. Adding
a 4 byte trailer after each data page to be sent seem to put the
kernel network stack far out of its comfort zone.

>hardiWARP peer(say Chesio adapter) should also understand the 64KB
>FPDU.

This is why it is currently disabled.

>At present, Chelsio T6 adapter could understand upto 16KB large FPDU
>size, max.
>
That's interesting and would already better the performance for
siw <-> hardware iwarp substantially I guess. Did you try that?

The siw code already has a hook to limit segment growing below
to what TCP advertises (since I experimented with earlier Chelsio
hardware which was able to accept different max segment sizes,
some up to full GSO (64k)). See siw.h:
u8 gso_seg_limit; /* Maximum segments for GSO, 0 = unbound */

While currently set to 1 during connection establishment for
not using GSO, other values would be respected. A '2' 
with 9k MTU would produce 16K FPDUs max.

>So is there a way that the peer(typically hard iWARP) could negotiate
>it's supported ingress FPDU size? instead of fixed 64KB size.
>
Again, it would not be fixed to 64K, but to what TCP tells siw.

In any case, thinking about that would make sense only if there
is interest from hardware side. Is there interest? Probably yes,
since single stream siw <-> siw on a 100Gb link with GSO would
have more than twice the throughput of siw <-> T6 w/o GSO ;)

>Adding other iWARP maintainers for wider audience, to let them share
>their thoughts. And if their adapters also have such ingress FPDU
>size
>limitations then I think allowing peer to negotiate its supported
>ingress
>FPDU size would be more efficient than going with fixed 64KB FPDU
>size.

I agree, would be great to get input from Intel and Marvell
as well...

Thanks for reviewing!
Bernard.