Re: Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental e2e negotiation of GSO usage.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: -----

>To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>
>From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
>Date: 05/14/2020 01:17PM
>Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx,
>mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx,
>jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx,
>nirranjan@xxxxxxxxxxx
>Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental
>e2e negotiation of GSO usage.
>
>On Wednesday, May 05/13/20, 2020 at 11:25:23 +0000, Bernard Metzler
>wrote:
>> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: -----
>> 
>> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>
>> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
>> >Date: 05/13/2020 05:50AM
>> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx,
>> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx,
>> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx,
>> >nirranjan@xxxxxxxxxxx
>> >Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental
>> >e2e negotiation of GSO usage.
>> >
>> >On Monday, May 05/11/20, 2020 at 15:28:47 +0000, Bernard Metzler
>> >wrote:
>> >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote:
>-----
>> >> 
>> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>
>> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
>> >> >Date: 05/07/2020 01:07PM
>> >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx,
>> >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx,
>> >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx,
>> >> >nirranjan@xxxxxxxxxxx
>> >> >Subject: [EXTERNAL] Re: Re: [RFC PATCH] RDMA/siw: Experimental
>e2e
>> >> >negotiation of GSO usage.
>> >> >
>> >> >Hi Bernard,
>> >> >Thanks for the review comments. Replied in line.
>> >> >
>> >> >On Tuesday, May 05/05/20, 2020 at 11:19:46 +0000, Bernard
>Metzler
>> >> >wrote:
>> >> >> 
>> >> >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote:
>> >-----
>> >> >> 
>> >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>
>> >> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
>> >> >> >Date: 04/28/2020 10:01PM
>> >> >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx,
>> >> >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx,
>dledford@xxxxxxxxxx,
>> >> >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx,
>bharat@xxxxxxxxxxx,
>> >> >> >nirranjan@xxxxxxxxxxx
>> >> >> >Subject: [EXTERNAL] Re: [RFC PATCH] RDMA/siw: Experimental
>e2e
>> >> >> >negotiation of GSO usage.
>> >> >> >
>> >> >> >On Wednesday, April 04/15/20, 2020 at 11:59:21 +0000,
>Bernard
>> >> >Metzler
>> >> >> >wrote:
>> >> >> >Hi Bernard,
>> >> >> >
>> >> >> >The attached patches enables the GSO negotiation code in SIW
>> >with
>> >> >> >few modifications, and also allows hardware iwarp drivers to
>> >> >> >advertise
>> >> >> >their max length(in 16/32/64KB granularity) that they can
>> >accept.
>> >> >> >The logic is almost similar to how TCP SYN MSS announcements
>> >works
>> >> >> >while
>> >> >> >3-way handshake.
>> >> >> >
>> >> >> >Please see if this approach works better for softiwarp <=>
>> >> >hardiwarp
>> >> >> >case.
>> >> >> >
>> >> >> >Thanks,
>> >> >> >Krishna. 
>> >> >> >
>> >> >> Hi Krishna,
>> >> >> 
>> >> >> Thanks for providing this. I have a few comments:
>> >> >> 
>> >> >> It would be good if we can look at patches inlined in the
>> >> >> email body, as usual.
>> >> >Sure, will do that henceforth.
>> >> >> 
>> >> >> Before further discussing a complex solution as suggested
>> >> >> here, I would like to hear comments from other iWarp HW
>> >> >> vendors on their capabilities regarding GSO frame acceptance
>> >> >> and potential preferences. 
>> >> >> 
>> >> >> The extension proposed here goes beyond what I initially sent
>> >> >> as a proposed patch. From an siw point of view, it is
>straight
>> >> >> forward to select using GSO or not, depending on the iWarp
>peer
>> >> >> ability to process large frames. What is proposed here is a
>> >> >> end-to-end negotiation of the actual frame size.
>> >> >> 
>> >> >> A comment in the patch you sent suggests adding a module
>> >> >> parameter. Module parameters are deprecated, and I removed
>any
>> >> >> of those from siw when it went upstream. I don't think we can
>> >> >> rely on that mechanism.
>> >> >> 
>> >> >> siw has a compile time parameter (yes, that was a module
>> >> >> parameter) which can set the maximum tx frame size (in
>multiples
>> >> >> of MTU size). Any static setup of siw <-> Chelsio could make
>> >> >> use of that as a work around.
>> >> >> 
>> >> >> I wonder if it would be a better idea to look into an
>extension
>> >> >> of the rdma netlink protocol, which would allow setting
>driver
>> >> >> specific parameters per port, or even per QP.
>> >> >> I assume there are more potential use cases for driver
>private
>> >> >> extensions of the rdma netlink interface?
>> >> >
>> >> >I think, the only problem with "configuring FPDU length via
>rdma
>> >> >netlink" is the enduser might not feel comfortable in finding
>what
>> >> >adapter
>> >> >is installed at the remote endpoint and what length it
>supports.
>> >Any
>> >> >thoughts on simplify this?
>> >> 
>> >> Nope. This would be 'out of band' information.
>> >> 
>> >> So we seem to have 3 possible solutions to the problem:
>> >> 
>> >> (1) detect if the peer accepts FPDUs up to current GSO size,
>> >> this is what I initially proposed. (2) negotiate a max FPDU
>> >> size with the peer, this is what you are proposing, or (3)
>> >> explicitly set that max FPDU size per extended user interface.
>> >> 
>> >> My problem with (2) is the rather significant proprietary
>> >> extension of MPA, since spare bits code a max value negotiation.
>> >> 
>> >> I proposed (1) for its simplicity - just a single bit flag,
>> >> which de-/selects GSO size for FPDUs on TX. Since Chelsio
>> >> can handle _some_ larger (up to 16k, you said) sizes, (1)
>> >> might have to be extended to cap at hard coded max size.
>> >> Again, it would be good to know what other vendors limits
>> >> are.
>> >> 
>> >> Does 16k for siw  <-> Chelsio already yield a decent
>> >> performance win?
>> >yes, 3x performance gain with just 16K GSO, compared to GSO
>diabled
>> >case. where MTU size is 1500.
>> >
>> 
>> That is a lot. At the other hand, I would suggest to always
>> increase MTU size to max (9k) for adapters siw attaches to.
>> With a page size of 4k, anything below 4k MTU size hurts,
>> while 9k already packs two consecutive pages into one frame,
>> if aligned.
>> 
>> Would 16k still gain a significant performance win if we have
>> set max MTU size for the interface?
>> 
>> >Regarding the rdma netlink approach that you are suggesting,
>should
>> >it
>> >be similar like below(?):
>> >
>> >rdma link set iwp3s0f4/1 max_fpdu_len 102.1.1.6:16384,
>> >102.5.5.6:32768
>> >
>> >
>> >rdma link show iwp3s0f4/1 max_fpdu_len
>> >        102.1.1.6:16384
>> >        102.5.5.6:32768
>> >
>> >where "102.1.1.6" is the destination IP address(such that the same
>> >max
>> >fpdu length is taken for all the connections to this
>> >address/adapter).
>> >And "16384" is max fdpu length.
>> >
>> Yes, that would be one way of doing it. Unfortunately we
>> would end up with maintaining additional permanent in kernel
>> state per peer we ever configured.
>> 
>> So, would it make sense to combine it with the iwpmd,
>> which then may cache peers, while setting max_fpdu per
>> new connection? This would probably include extending the
>> proprietary port mapper protocol, to exchange local
>> preferences with the peer. Local capabilities might
>> be queried from the device (extending enum ib_mtu to
>> more than 4k, and using ibv_query_port()). And the
>> iw_cm_id to be extended to carry that extra parameter
>> down to the driver... Sounds complicated.
>If I understand you right, client/server advertises their Max FPDU
>len
>in Res field of PMReq/PMAccept frames.
>typedef struct iwpm_wire_msg {
>        __u8    magic;
>        __u8    pmtime;
>        __be16  reserved;
>Then after Portmapper negotiation, the fpdu len is propagated to SIW
>qp
>strucutre from userspace iwpmd.
>		
>If we weigh up the pros and cons of using PortMapper Res field vs MPA
>Res feild, then looks like using MPA is less complicated, considering
>the lines of changes and modules invovled in changes. Not sure my
>analysis is right here?
>
One important difference IMHO is that one approach would touch an
established IETF communication protocol (MPA), the other a
proprietary application (iwpmd).


>Between, looks like the existing SIW GSO code needs a logic to limit
>"c_tx->tcp_seglen" to 64K-1, as MPA len is only 16bit. Say, in future
>to
>best utilize 400G Ethernet, if Linux TCP stack has increased
>GSO_MAX_SIZE to 128K, then SIW will cast 18bit value to 16bit MPA
>len.
>
Isn't GSO bound to IP fragmentation?

Thanks,
Bernard




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux