-----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> >Date: 05/14/2020 01:17PM >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, >nirranjan@xxxxxxxxxxx >Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental >e2e negotiation of GSO usage. > >On Wednesday, May 05/13/20, 2020 at 11:25:23 +0000, Bernard Metzler >wrote: >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> >> >Date: 05/13/2020 05:50AM >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, >> >nirranjan@xxxxxxxxxxx >> >Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental >> >e2e negotiation of GSO usage. >> > >> >On Monday, May 05/11/20, 2020 at 15:28:47 +0000, Bernard Metzler >> >wrote: >> >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: >----- >> >> >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> >> >> >Date: 05/07/2020 01:07PM >> >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, >> >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, >> >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, >> >> >nirranjan@xxxxxxxxxxx >> >> >Subject: [EXTERNAL] Re: Re: [RFC PATCH] RDMA/siw: Experimental >e2e >> >> >negotiation of GSO usage. >> >> > >> >> >Hi Bernard, >> >> >Thanks for the review comments. Replied in line. >> >> > >> >> >On Tuesday, May 05/05/20, 2020 at 11:19:46 +0000, Bernard >Metzler >> >> >wrote: >> >> >> >> >> >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: >> >----- >> >> >> >> >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >> >> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> >> >> >> >Date: 04/28/2020 10:01PM >> >> >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, >> >> >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, >dledford@xxxxxxxxxx, >> >> >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, >bharat@xxxxxxxxxxx, >> >> >> >nirranjan@xxxxxxxxxxx >> >> >> >Subject: [EXTERNAL] Re: [RFC PATCH] RDMA/siw: Experimental >e2e >> >> >> >negotiation of GSO usage. >> >> >> > >> >> >> >On Wednesday, April 04/15/20, 2020 at 11:59:21 +0000, >Bernard >> >> >Metzler >> >> >> >wrote: >> >> >> >Hi Bernard, >> >> >> > >> >> >> >The attached patches enables the GSO negotiation code in SIW >> >with >> >> >> >few modifications, and also allows hardware iwarp drivers to >> >> >> >advertise >> >> >> >their max length(in 16/32/64KB granularity) that they can >> >accept. >> >> >> >The logic is almost similar to how TCP SYN MSS announcements >> >works >> >> >> >while >> >> >> >3-way handshake. >> >> >> > >> >> >> >Please see if this approach works better for softiwarp <=> >> >> >hardiwarp >> >> >> >case. >> >> >> > >> >> >> >Thanks, >> >> >> >Krishna. >> >> >> > >> >> >> Hi Krishna, >> >> >> >> >> >> Thanks for providing this. I have a few comments: >> >> >> >> >> >> It would be good if we can look at patches inlined in the >> >> >> email body, as usual. >> >> >Sure, will do that henceforth. >> >> >> >> >> >> Before further discussing a complex solution as suggested >> >> >> here, I would like to hear comments from other iWarp HW >> >> >> vendors on their capabilities regarding GSO frame acceptance >> >> >> and potential preferences. >> >> >> >> >> >> The extension proposed here goes beyond what I initially sent >> >> >> as a proposed patch. From an siw point of view, it is >straight >> >> >> forward to select using GSO or not, depending on the iWarp >peer >> >> >> ability to process large frames. What is proposed here is a >> >> >> end-to-end negotiation of the actual frame size. >> >> >> >> >> >> A comment in the patch you sent suggests adding a module >> >> >> parameter. Module parameters are deprecated, and I removed >any >> >> >> of those from siw when it went upstream. I don't think we can >> >> >> rely on that mechanism. >> >> >> >> >> >> siw has a compile time parameter (yes, that was a module >> >> >> parameter) which can set the maximum tx frame size (in >multiples >> >> >> of MTU size). Any static setup of siw <-> Chelsio could make >> >> >> use of that as a work around. >> >> >> >> >> >> I wonder if it would be a better idea to look into an >extension >> >> >> of the rdma netlink protocol, which would allow setting >driver >> >> >> specific parameters per port, or even per QP. >> >> >> I assume there are more potential use cases for driver >private >> >> >> extensions of the rdma netlink interface? >> >> > >> >> >I think, the only problem with "configuring FPDU length via >rdma >> >> >netlink" is the enduser might not feel comfortable in finding >what >> >> >adapter >> >> >is installed at the remote endpoint and what length it >supports. >> >Any >> >> >thoughts on simplify this? >> >> >> >> Nope. This would be 'out of band' information. >> >> >> >> So we seem to have 3 possible solutions to the problem: >> >> >> >> (1) detect if the peer accepts FPDUs up to current GSO size, >> >> this is what I initially proposed. (2) negotiate a max FPDU >> >> size with the peer, this is what you are proposing, or (3) >> >> explicitly set that max FPDU size per extended user interface. >> >> >> >> My problem with (2) is the rather significant proprietary >> >> extension of MPA, since spare bits code a max value negotiation. >> >> >> >> I proposed (1) for its simplicity - just a single bit flag, >> >> which de-/selects GSO size for FPDUs on TX. Since Chelsio >> >> can handle _some_ larger (up to 16k, you said) sizes, (1) >> >> might have to be extended to cap at hard coded max size. >> >> Again, it would be good to know what other vendors limits >> >> are. >> >> >> >> Does 16k for siw <-> Chelsio already yield a decent >> >> performance win? >> >yes, 3x performance gain with just 16K GSO, compared to GSO >diabled >> >case. where MTU size is 1500. >> > >> >> That is a lot. At the other hand, I would suggest to always >> increase MTU size to max (9k) for adapters siw attaches to. >> With a page size of 4k, anything below 4k MTU size hurts, >> while 9k already packs two consecutive pages into one frame, >> if aligned. >> >> Would 16k still gain a significant performance win if we have >> set max MTU size for the interface? >> >> >Regarding the rdma netlink approach that you are suggesting, >should >> >it >> >be similar like below(?): >> > >> >rdma link set iwp3s0f4/1 max_fpdu_len 102.1.1.6:16384, >> >102.5.5.6:32768 >> > >> > >> >rdma link show iwp3s0f4/1 max_fpdu_len >> > 102.1.1.6:16384 >> > 102.5.5.6:32768 >> > >> >where "102.1.1.6" is the destination IP address(such that the same >> >max >> >fpdu length is taken for all the connections to this >> >address/adapter). >> >And "16384" is max fdpu length. >> > >> Yes, that would be one way of doing it. Unfortunately we >> would end up with maintaining additional permanent in kernel >> state per peer we ever configured. >> >> So, would it make sense to combine it with the iwpmd, >> which then may cache peers, while setting max_fpdu per >> new connection? This would probably include extending the >> proprietary port mapper protocol, to exchange local >> preferences with the peer. Local capabilities might >> be queried from the device (extending enum ib_mtu to >> more than 4k, and using ibv_query_port()). And the >> iw_cm_id to be extended to carry that extra parameter >> down to the driver... Sounds complicated. >If I understand you right, client/server advertises their Max FPDU >len >in Res field of PMReq/PMAccept frames. >typedef struct iwpm_wire_msg { > __u8 magic; > __u8 pmtime; > __be16 reserved; >Then after Portmapper negotiation, the fpdu len is propagated to SIW >qp >strucutre from userspace iwpmd. > >If we weigh up the pros and cons of using PortMapper Res field vs MPA >Res feild, then looks like using MPA is less complicated, considering >the lines of changes and modules invovled in changes. Not sure my >analysis is right here? > One important difference IMHO is that one approach would touch an established IETF communication protocol (MPA), the other a proprietary application (iwpmd). >Between, looks like the existing SIW GSO code needs a logic to limit >"c_tx->tcp_seglen" to 64K-1, as MPA len is only 16bit. Say, in future >to >best utilize 400G Ethernet, if Linux TCP stack has increased >GSO_MAX_SIZE to 128K, then SIW will cast 18bit value to 16bit MPA >len. > Isn't GSO bound to IP fragmentation? Thanks, Bernard