> -----Original Message----- > From: Yuval Shaia <yuval.shaia@xxxxxxxxxx> > Sent: Tuesday, March 5, 2019 4:55 AM > To: Parav Pandit <parav@xxxxxxxxxxxx> > Cc: Ira Weiny <ira.weiny@xxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; > Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx>; bvanassche@xxxxxxx; > linux-rdma@xxxxxxxxxxxxxxx; Marcel Apfelbaum > <marcel.apfelbaum@xxxxxxxxx>; Kamal Heib <kheib@xxxxxxxxxx> > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device > > On Mon, Mar 04, 2019 at 05:10:35PM +0000, Parav Pandit wrote: > > > > > > > -----Original Message----- > > > From: Yuval Shaia <yuval.shaia@xxxxxxxxxx> > > > Sent: Monday, March 4, 2019 10:57 AM > > > To: Parav Pandit <parav@xxxxxxxxxxxx> > > > Cc: Ira Weiny <ira.weiny@xxxxxxxxx>; Leon Romanovsky > > > <leon@xxxxxxxxxx>; Dennis Dalessandro > > > <dennis.dalessandro@xxxxxxxxx>; bvanassche@xxxxxxx; > > > linux-rdma@xxxxxxxxxxxxxxx; Marcel Apfelbaum > > > <marcel.apfelbaum@xxxxxxxxx>; Kamal Heib <kheib@xxxxxxxxxx> > > > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device > > > > > > On Mon, Mar 04, 2019 at 02:47:43PM +0000, Parav Pandit wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yuval Shaia <yuval.shaia@xxxxxxxxxx> > > > > > Sent: Monday, March 4, 2019 1:56 AM > > > > > To: Parav Pandit <parav@xxxxxxxxxxxx> > > > > > Cc: Ira Weiny <ira.weiny@xxxxxxxxx>; Leon Romanovsky > > > > > <leon@xxxxxxxxxx>; Dennis Dalessandro > > > > > <dennis.dalessandro@xxxxxxxxx>; bvanassche@xxxxxxx; > > > > > linux-rdma@xxxxxxxxxxxxxxx > > > > > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device > > > > > > > > > > On Fri, Mar 01, 2019 at 06:27:34AM +0000, Parav Pandit wrote: > > > > > > > > > > > > > What is the real use case for this? > > > > > > Use case is fairly simple. Group of developers and users are > > > > > > running VMs > > > > > on laptop and in cloud without hw HCAs for devel purposes and > > > > > automated unit and Jenkins tests of their application. > > > > > > Such tests run for few hundreds of QPs and intent to exchange > > > > > > million > > > > > messages. > > > > > > This is mainly RoCE users. > > > > > > rxe is not fitting these needs with its current state. > > > > > > > > > > To run RDMA device in a VM even when no real HCA is installed on > > > > > host i can suggest QEMU's pvrdma device which is on its way to > > > > > become a real product. > > > > > > > > > Can you please share the github repo or other open source link > > > > which > > > provides the host side of emulation code? > > > > > > Official QEMU repo is here: > > > https://github.com/qemu/qemu > > > > > > > I am interested to know how does it work on Linux host without > > > > modifying > > > kernel, if it is close to production line. > > > > > > On a host with no real HCA the backend device is RXE. > > > > > Host and ibverb is not aware of the IP stack of the VM. > > > > > > Linux host side code RoCE code doesn't support pvrdma's need. It > > > > requires > > > ABI changes... different discussion. > > > > > > No, no ABI changes is needed, pvrdma is an ibverb client. > > > > > > > > > > > And after doing all of it, such VM still requires enhanced host. > > > > This > > > approach doesn't have any of those limitations. > > > > > > Can you elaborate on that? why enhanced host? > > > > > Host is not aware of the IP stack of the VM. > > How do you resolve mac address for the destination IP used in guest VM in > host, and how do you program right source mac and destination mac of the > QP in host using ibverbs client? > > I was asking Aviad in Mellanox to use devx interface do have passthrough > programming. > > So want to understand how are you doing this QEMU as its close to > production now? > > Not sure i fully understand your question, ex why host needs to know the > guest IP. > Anyway, the flow is like this: > - In guest, ib_core calls the driver's add_gid hook when gid entry is > created > - Driver in guest passes binding info to qemu device (sgid and gid) > - qemu device adds this gid to host gid table (via netlink or QMP), note > that sgid in host probably is different. Please also note that at this > stage we have this gid defined twice in the fabric but since one is in > guest, which is hidden, we are ok. (1098) > - When guest creates QP it passes the guest sgid to qemu device which > replace it with the host sgid. (#436) > > Since gid is defined in the host the all the routing is done as usual. > > Full details of the above is available here > https://github.com/qemu/qemu/commit/2b05705dc8ad80c09a3aa9cc70c14f > b8323b0fd3 > > Hope this answers your question. > I took cursory look. I am almost sure that if guest VM's IP address is not added to host, modify_qp() in kernel is going to fail. Unless you use a different GID in host. And build some smart way to figure out which one to use. > > > > > > > > > > > > > > > > > > Bart was looking to run on loopback (similar to other user > > > > > > request I > > > > > received) here [1]. > > > > > > > > > > > > So I am looking at something insanely simple but extendible > > > > > > who can find > > > > > more uses as we go forward. > > > > > > > > > > Suggestion: To enhance 'loopback' performances, can you consider > > > > > using shared memory or any other IPC instead of going thought > > > > > the > > > network stack? > > > > > > > > > Loopback driver in this patchset doesn't use network stack. > > > > It is just 2000 lines of wrapper to memcpy() to enables > > > > applications to use > > > rdma. > > > > > > gr8!! > > > I had plans to patch RXE to support it but didn't found the time. > > > > > > So actually why not do it in RXE? > > > > > Can you please publish fio results with nfs-rdma, nvme-fabrics, rds and > perftest using --infinite option by running it for one hour or so with rxe? > > It's been a while I did that. Last time when I tried with 5.0.0-rc5, perftest > crashed the kernel on MR registration. > > For now the device is limited to IBV_WR_SEND and IBV_WR_RECV opcodes > so anything with IBV_WR_RDMA_* is not yet supported. > User is running on Oracle virtual box on Windows laptop, where pvrdma backend is not available. Same goes to running VM in cloud where backend pvrdma is not available. Rdma is already hard to do and now we need to ask users to run a VM inside a VM and both have to have up to date kernel... Additionally it doesn't even reach basic criteria of running nvme fabrics perftests, qp1 of the users... So pvrdma + qemu is not a good starting point for this particular use case... And loopback is perfect driver for vm suspend/resume or migration cases with no dependency on host. But I don't think anyone would care for this anyway.