-----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> >Date: 05/13/2020 05:50AM >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, >nirranjan@xxxxxxxxxxx >Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental >e2e negotiation of GSO usage. > >On Monday, May 05/11/20, 2020 at 15:28:47 +0000, Bernard Metzler >wrote: >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> >> >Date: 05/07/2020 01:07PM >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, >> >nirranjan@xxxxxxxxxxx >> >Subject: [EXTERNAL] Re: Re: [RFC PATCH] RDMA/siw: Experimental e2e >> >negotiation of GSO usage. >> > >> >Hi Bernard, >> >Thanks for the review comments. Replied in line. >> > >> >On Tuesday, May 05/05/20, 2020 at 11:19:46 +0000, Bernard Metzler >> >wrote: >> >> >> >> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: >----- >> >> >> >> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> >> >> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> >> >> >Date: 04/28/2020 10:01PM >> >> >Cc: faisal.latif@xxxxxxxxx, shiraz.saleem@xxxxxxxxx, >> >> >mkalderon@xxxxxxxxxxx, aelior@xxxxxxxxxxx, dledford@xxxxxxxxxx, >> >> >jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, >> >> >nirranjan@xxxxxxxxxxx >> >> >Subject: [EXTERNAL] Re: [RFC PATCH] RDMA/siw: Experimental e2e >> >> >negotiation of GSO usage. >> >> > >> >> >On Wednesday, April 04/15/20, 2020 at 11:59:21 +0000, Bernard >> >Metzler >> >> >wrote: >> >> >Hi Bernard, >> >> > >> >> >The attached patches enables the GSO negotiation code in SIW >with >> >> >few modifications, and also allows hardware iwarp drivers to >> >> >advertise >> >> >their max length(in 16/32/64KB granularity) that they can >accept. >> >> >The logic is almost similar to how TCP SYN MSS announcements >works >> >> >while >> >> >3-way handshake. >> >> > >> >> >Please see if this approach works better for softiwarp <=> >> >hardiwarp >> >> >case. >> >> > >> >> >Thanks, >> >> >Krishna. >> >> > >> >> Hi Krishna, >> >> >> >> Thanks for providing this. I have a few comments: >> >> >> >> It would be good if we can look at patches inlined in the >> >> email body, as usual. >> >Sure, will do that henceforth. >> >> >> >> Before further discussing a complex solution as suggested >> >> here, I would like to hear comments from other iWarp HW >> >> vendors on their capabilities regarding GSO frame acceptance >> >> and potential preferences. >> >> >> >> The extension proposed here goes beyond what I initially sent >> >> as a proposed patch. From an siw point of view, it is straight >> >> forward to select using GSO or not, depending on the iWarp peer >> >> ability to process large frames. What is proposed here is a >> >> end-to-end negotiation of the actual frame size. >> >> >> >> A comment in the patch you sent suggests adding a module >> >> parameter. Module parameters are deprecated, and I removed any >> >> of those from siw when it went upstream. I don't think we can >> >> rely on that mechanism. >> >> >> >> siw has a compile time parameter (yes, that was a module >> >> parameter) which can set the maximum tx frame size (in multiples >> >> of MTU size). Any static setup of siw <-> Chelsio could make >> >> use of that as a work around. >> >> >> >> I wonder if it would be a better idea to look into an extension >> >> of the rdma netlink protocol, which would allow setting driver >> >> specific parameters per port, or even per QP. >> >> I assume there are more potential use cases for driver private >> >> extensions of the rdma netlink interface? >> > >> >I think, the only problem with "configuring FPDU length via rdma >> >netlink" is the enduser might not feel comfortable in finding what >> >adapter >> >is installed at the remote endpoint and what length it supports. >Any >> >thoughts on simplify this? >> >> Nope. This would be 'out of band' information. >> >> So we seem to have 3 possible solutions to the problem: >> >> (1) detect if the peer accepts FPDUs up to current GSO size, >> this is what I initially proposed. (2) negotiate a max FPDU >> size with the peer, this is what you are proposing, or (3) >> explicitly set that max FPDU size per extended user interface. >> >> My problem with (2) is the rather significant proprietary >> extension of MPA, since spare bits code a max value negotiation. >> >> I proposed (1) for its simplicity - just a single bit flag, >> which de-/selects GSO size for FPDUs on TX. Since Chelsio >> can handle _some_ larger (up to 16k, you said) sizes, (1) >> might have to be extended to cap at hard coded max size. >> Again, it would be good to know what other vendors limits >> are. >> >> Does 16k for siw <-> Chelsio already yield a decent >> performance win? >yes, 3x performance gain with just 16K GSO, compared to GSO diabled >case. where MTU size is 1500. > That is a lot. At the other hand, I would suggest to always increase MTU size to max (9k) for adapters siw attaches to. With a page size of 4k, anything below 4k MTU size hurts, while 9k already packs two consecutive pages into one frame, if aligned. Would 16k still gain a significant performance win if we have set max MTU size for the interface? >Regarding the rdma netlink approach that you are suggesting, should >it >be similar like below(?): > >rdma link set iwp3s0f4/1 max_fpdu_len 102.1.1.6:16384, >102.5.5.6:32768 > > >rdma link show iwp3s0f4/1 max_fpdu_len > 102.1.1.6:16384 > 102.5.5.6:32768 > >where "102.1.1.6" is the destination IP address(such that the same >max >fpdu length is taken for all the connections to this >address/adapter). >And "16384" is max fdpu length. > Yes, that would be one way of doing it. Unfortunately we would end up with maintaining additional permanent in kernel state per peer we ever configured. So, would it make sense to combine it with the iwpmd, which then may cache peers, while setting max_fpdu per new connection? This would probably include extending the proprietary port mapper protocol, to exchange local preferences with the peer. Local capabilities might be queried from the device (extending enum ib_mtu to more than 4k, and using ibv_query_port()). And the iw_cm_id to be extended to carry that extra parameter down to the driver... Sounds complicated. Thanks, Bernard.