On Thursday, May 05/14/20, 2020 at 13:07:33 +0000, Bernard Metzler wrote: > -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- > > >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> > >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >Date: 05/14/2020 01:17PM > >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, > >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, > >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, > >nirranjan@xxxxxxxxxxx > >Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental > >e2e negotiation of GSO usage. > > > >On Wednesday, May 05/13/20, 2020 at 11:25:23 +0000, Bernard Metzler > >wrote: > >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- > >> > >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> > >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >> >Date: 05/13/2020 05:50AM > >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, > >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, > >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, > >> >nirranjan@xxxxxxxxxxx > >> >Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental > >> >e2e negotiation of GSO usage. > >> > > >> >On Monday, May 05/11/20, 2020 at 15:28:47 +0000, Bernard Metzler > >> >wrote: > >> >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: > >----- > >> >> > >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> > >> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >> >> >Date: 05/07/2020 01:07PM > >> >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, > >> >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, > >> >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, > >> >> >nirranjan@xxxxxxxxxxx > >> >> >Subject: [EXTERNAL] Re: Re: [RFC PATCH] RDMA/siw: Experimental > >e2e > >> >> >negotiation of GSO usage. > >> >> > > >> >> >Hi Bernard, > >> >> >Thanks for the review comments. Replied in line. > >> >> > > >> >> >On Tuesday, May 05/05/20, 2020 at 11:19:46 +0000, Bernard > >Metzler > >> >> >wrote: > >> >> >> > >> >> >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: > >> >----- > >> >> >> > >> >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> > >> >> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >> >> >> >Date: 04/28/2020 10:01PM > >> >> >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, > >> >> >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, > >dledford@xxxxxxxxxx, > >> >> >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, > >bharat@xxxxxxxxxxx, > >> >> >> >nirranjan@xxxxxxxxxxx > >> >> >> >Subject: [EXTERNAL] Re: [RFC PATCH] RDMA/siw: Experimental > >e2e > >> >> >> >negotiation of GSO usage. > >> >> >> > > >> >> >> >On Wednesday, April 04/15/20, 2020 at 11:59:21 +0000, > >Bernard > >> >> >Metzler > >> >> >> >wrote: > >> >> >> >Hi Bernard, > >> >> >> > > >> >> >> >The attached patches enables the GSO negotiation code in SIW > >> >with > >> >> >> >few modifications, and also allows hardware iwarp drivers to > >> >> >> >advertise > >> >> >> >their max length(in 16/32/64KB granularity) that they can > >> >accept. > >> >> >> >The logic is almost similar to how TCP SYN MSS announcements > >> >works > >> >> >> >while > >> >> >> >3-way handshake. > >> >> >> > > >> >> >> >Please see if this approach works better for softiwarp <=> > >> >> >hardiwarp > >> >> >> >case. > >> >> >> > > >> >> >> >Thanks, > >> >> >> >Krishna. > >> >> >> > > >> >> >> Hi Krishna, > >> >> >> > >> >> >> Thanks for providing this. I have a few comments: > >> >> >> > >> >> >> It would be good if we can look at patches inlined in the > >> >> >> email body, as usual. > >> >> >Sure, will do that henceforth. > >> >> >> > >> >> >> Before further discussing a complex solution as suggested > >> >> >> here, I would like to hear comments from other iWarp HW > >> >> >> vendors on their capabilities regarding GSO frame acceptance > >> >> >> and potential preferences. > >> >> >> > >> >> >> The extension proposed here goes beyond what I initially sent > >> >> >> as a proposed patch. From an siw point of view, it is > >straight > >> >> >> forward to select using GSO or not, depending on the iWarp > >peer > >> >> >> ability to process large frames. What is proposed here is a > >> >> >> end-to-end negotiation of the actual frame size. > >> >> >> > >> >> >> A comment in the patch you sent suggests adding a module > >> >> >> parameter. Module parameters are deprecated, and I removed > >any > >> >> >> of those from siw when it went upstream. I don't think we can > >> >> >> rely on that mechanism. > >> >> >> > >> >> >> siw has a compile time parameter (yes, that was a module > >> >> >> parameter) which can set the maximum tx frame size (in > >multiples > >> >> >> of MTU size). Any static setup of siw <-> Chelsio could make > >> >> >> use of that as a work around. > >> >> >> > >> >> >> I wonder if it would be a better idea to look into an > >extension > >> >> >> of the rdma netlink protocol, which would allow setting > >driver > >> >> >> specific parameters per port, or even per QP. > >> >> >> I assume there are more potential use cases for driver > >private > >> >> >> extensions of the rdma netlink interface? > >> >> > > >> >> >I think, the only problem with "configuring FPDU length via > >rdma > >> >> >netlink" is the enduser might not feel comfortable in finding > >what > >> >> >adapter > >> >> >is installed at the remote endpoint and what length it > >supports. > >> >Any > >> >> >thoughts on simplify this? > >> >> > >> >> Nope. This would be 'out of band' information. > >> >> > >> >> So we seem to have 3 possible solutions to the problem: > >> >> > >> >> (1) detect if the peer accepts FPDUs up to current GSO size, > >> >> this is what I initially proposed. (2) negotiate a max FPDU > >> >> size with the peer, this is what you are proposing, or (3) > >> >> explicitly set that max FPDU size per extended user interface. > >> >> > >> >> My problem with (2) is the rather significant proprietary > >> >> extension of MPA, since spare bits code a max value negotiation. > >> >> > >> >> I proposed (1) for its simplicity - just a single bit flag, > >> >> which de-/selects GSO size for FPDUs on TX. Since Chelsio > >> >> can handle _some_ larger (up to 16k, you said) sizes, (1) > >> >> might have to be extended to cap at hard coded max size. > >> >> Again, it would be good to know what other vendors limits > >> >> are. > >> >> > >> >> Does 16k for siw <-> Chelsio already yield a decent > >> >> performance win? > >> >yes, 3x performance gain with just 16K GSO, compared to GSO > >diabled > >> >case. where MTU size is 1500. > >> > > >> > >> That is a lot. At the other hand, I would suggest to always > >> increase MTU size to max (9k) for adapters siw attaches to. > >> With a page size of 4k, anything below 4k MTU size hurts, > >> while 9k already packs two consecutive pages into one frame, > >> if aligned. > >> > >> Would 16k still gain a significant performance win if we have > >> set max MTU size for the interface? Unfortunately no difference in throughput when MTU is 9K, for 16K FPDU. Looks like TCP stack constructs GSO/TSO buffer in multiples of HW MSS(tp->mss_cache). So, as 16K FPDU buffer is not a multiple of 9K, TCP stack slices 16K buffer into 9K & 7K buffers before passing it to NIC driver. Thus no difference in perfromance as each tx packet to NIC cannot go beyond 9K, when FPDU len is 16K. > >> > >> >Regarding the rdma netlink approach that you are suggesting, > >should > >> >it > >> >be similar like below(?): > >> > > >> >rdma link set iwp3s0f4/1 max_fpdu_len 102.1.1.6:16384, > >> >102.5.5.6:32768 > >> > > >> > > >> >rdma link show iwp3s0f4/1 max_fpdu_len > >> > 102.1.1.6:16384 > >> > 102.5.5.6:32768 > >> > > >> >where "102.1.1.6" is the destination IP address(such that the same > >> >max > >> >fpdu length is taken for all the connections to this > >> >address/adapter). > >> >And "16384" is max fdpu length. > >> > > >> Yes, that would be one way of doing it. Unfortunately we > >> would end up with maintaining additional permanent in kernel > >> state per peer we ever configured. > >> > >> So, would it make sense to combine it with the iwpmd, > >> which then may cache peers, while setting max_fpdu per > >> new connection? This would probably include extending the > >> proprietary port mapper protocol, to exchange local > >> preferences with the peer. Local capabilities might > >> be queried from the device (extending enum ib_mtu to > >> more than 4k, and using ibv_query_port()). And the > >> iw_cm_id to be extended to carry that extra parameter > >> down to the driver... Sounds complicated. > >If I understand you right, client/server advertises their Max FPDU > >len > >in Res field of PMReq/PMAccept frames. > >typedef struct iwpm_wire_msg { > > __u8 magic; > > __u8 pmtime; > > __be16 reserved; > >Then after Portmapper negotiation, the fpdu len is propagated to SIW > >qp > >strucutre from userspace iwpmd. > > > >If we weigh up the pros and cons of using PortMapper Res field vs MPA > >Res feild, then looks like using MPA is less complicated, considering > >the lines of changes and modules invovled in changes. Not sure my > >analysis is right here? > > > One important difference IMHO is that one approach would touch an > established IETF communication protocol (MPA), the other a > proprietary application (iwpmd). Ok, will explore more on iwpmd approach, may be prototyping this would help. > > > >Between, looks like the existing SIW GSO code needs a logic to limit > >"c_tx->tcp_seglen" to 64K-1, as MPA len is only 16bit. Say, in future > >to > >best utilize 400G Ethernet, if Linux TCP stack has increased > >GSO_MAX_SIZE to 128K, then SIW will cast 18bit value to 16bit MPA > >len. > > > Isn't GSO bound to IP fragmentation? Not sure. But I would say it's better we limit "c_tx->tcp_seglen" somewhere to 64K-1 to avoid future risks. > > Thanks, > Bernard >