-----"Sagi Grimberg" <sagi@xxxxxxxxxxx> wrote: ----- >To: "Jason Gunthorpe" <jgg@xxxxxxxx>, "Olga Kornievskaia" ><aglo@xxxxxxxxx> >From: "Sagi Grimberg" <sagi@xxxxxxxxxxx> >Date: 04/25/2019 09:07AM >Cc: "Bernard Metzler" <bmt@xxxxxxxxxxxxxx>, "linux-rdma" ><linux-rdma@xxxxxxxxxxxxxxx> >Subject: Re: [PATCH v7 00/12] SIW: Request for Comments > >>> Hi Jason, >>> >>> I'd like to provide my feedback about testing this code and >running >>> NFS over RDMA over the software iWarp. With much appreciated help >from >>> Bernard, I setup 2 CentOS 7.6 VMs and his v7 kernel branch. I >>> successfully, ran NFS connectathon test suite, xfstests, and ran >"make >>> -j" compile of the linux kernel. Current code is useful for >NFSoRDMA >>> functional testing. From a very limited comparison timing study in >all >>> virtual environment, it is lacking a bit in performance compared >to >>> non-RDMA mount (but it's better than software RoCE). >> >> Excellent feed back, thank you. >> >> Lets hear from NVMeof too please > >I actually took a stab and gave this a test drive with nvme/rdma >and iser (thanks Steve for making our lives better with rdma tool add > >link support), think it was v6 though... > >There were some strange debug messages overlooked IIRC, and there >were some error messages, but things worked so don't know what >to make of it. > >Pretty much the same feedback here, very limited testing on my VMs >shows: >- functionally works >- faster than rxe >- slower than non-rdma (which sorta makes sense I assume) > > Hi Sagi, Many thanks for the feedback! Performance was not my main concern since re-trying for acceptance for upstream. I will look into perf tuning once we have it accepted. One penalty we pay is - for HW interoperability - disabling segmentation offloading awareness at sender side. While we could build up to 64k frames in one shot (having it segmented on the wire by the NIC), and process them same way in one shot at target side, we don't do so, since some target iWarp hardware cannot handle MPA frames larger than real MTU size. For siw - siw testing, we may switch back on GSO awareness. These days, this is a compile time selection only (since we abandoned all module parameters). Proposing another extension of the netlink stuff for passing those driver private parameters is on my todo list, but definitely not at the current stage. In general, sitting on top of kernel TCP socket, adding some protocol overhead, and even a 4 byte trailer checksum _after_ the data buffers comes with a penalty, if the kernel application would otherwise use the plain kernel TCP socket itself... The performance story might be different for user level applications, which potentially benefit more from the asynchronous verbs interface. I learned Chelsio was doing some perf testing of NVMeF via siw against iWarp HW themselves. They report line speed in a 100Gbs setup if siw is on 2 clients side, talking to a T6 RNIC: https://www.prnewswire.com/news-releases/chelsio-demonstrated-soft-iwarp-at-nvme-developer-days-300815249.html and https://www.chelsio.com/wp-content/uploads/resources/t6-100g-siw-nvmeof.pdf Thanks, Bernard.