> -----Original Message----- > From: Yuval Shaia <yuval.shaia@xxxxxxxxxx> > Sent: Sunday, March 10, 2019 3:58 AM > To: Parav Pandit <parav@xxxxxxxxxxxx> > Cc: Bart Van Assche <bvanassche@xxxxxxx>; Ira Weiny > <ira.weiny@xxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; Dennis > Dalessandro <dennis.dalessandro@xxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; > Marcel Apfelbaum <marcel.apfelbaum@xxxxxxxxx>; Kamal Heib > <kheib@xxxxxxxxxx> > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device > > On Thu, Mar 07, 2019 at 01:41:10AM +0000, Parav Pandit wrote: > > Hi Yuval, > > > > > -----Original Message----- > > > From: Bart Van Assche > > > Sent: Wednesday, March 6, 2019 2:39 PM > > > To: Yuval Shaia ; Parav Pandit > > > Cc: Ira Weiny ; Leon Romanovsky ; Dennis Dalessandro ; linux- > > > rdma@xxxxxxxxxxxxxxx; Marcel Apfelbaum ; Kamal Heib > > > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device > > > > > > On Wed, 2019-03-06 at 22:14 +0200, Yuval Shaia wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Suggestion: To enhance 'loopback' performances, can > > > > > > > > > > you consider using shared memory or any other IPC > > > > > > > > > > instead of going thought the > > > > > > > > > > > > > > > > network stack? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Loopback driver in this patchset doesn't use network stack. > > > > > > > > > It is just 2000 lines of wrapper to memcpy() to enables > > > > > > > > > applications to use > > > > > > > > > > > > > > > > rdma. > > > > > > > > > > > > > > > > To have a dedicated driver just for the loopback will > > > > > > > > force the user to do a smart select, i.e. to use lo device > > > > > > > > for local traffic and rxe for non- > > > > > > > > > > > > local. > > > > > > > No. when application is written using rdmacm, everything > > > > > > > works based on > > > > > > > > > > > > the ip address. > > > > > > > It will pick the right rdma device that matches this ip. > > > > > > > It would be 'lo' when connections are on 127.0.0.1. > > > > > > > When application such as MPI, will have to anyway specify > > > > > > > the which rdma > > > > > > > > > > > > device they want to use in system. > > > > > > > > > > > > But what if one wants to stay at the verb level and not use > > > > > > rdmacm > > > API? > > > > > > > > > > > > > > > > Sure. He can stay at verb level where he anyway have to > > > > > explicitly give > > > the device name. > > > > > > > > And that's is exactly the problem! > > > > > > > > With qemu, the ibdev is given at the command-line of the virtual > > > > machine so if two guests starts on the same host it is ok to give > > > > them the lo device as backend but what will happen when one of the > > > > VMs will migrate to another host? The traffic will break since the > > > > lo device cannot > > > go outside. > > > > > > Hi Yuval, > > > > > > I think what you are describing falls outside the use cases Parav has in > mind. > > > I think that optimizing RDMA over loopback, even if that loopback > > > only works inside a single VM, is useful. > > > > > > > lo rdma device takes birth inside the VM, migrates as pure memory to > other host, just like any lo netdev, and dies in VM. > > There is no need to give lo device from outside to the guest VM. > > Already answered to Bart's email so do not want to repeat my reply here. > > I see your point just do not want to turn one use case to a generic use, i.e. > the same 'pure memcpy enhancement' requirements applies to a broader > scope than your use case, enhancing rxe will hit them both while having yet > another sw device will cover only your use case. > I do not know how to enhance rxe to same level of correctness, efficiency as the proposed loopback driver. This driver uses core infrastructure for all the ibv commands, making very light user space and kernel handling. I did some perf analysis and it has overheads in kmalloc/kfree(). But want to improve them without changing user interface or do it by incremental enhancements. I haven't figured out QP0 details for IB, but intent to support IB link layer too. It also requires little different locking scheme as it takes both qps reference at same time. One good way to support rxe enhancement you have in mind is to publish RFC patches and showing how it is doable, efficient, good performance and stable. That would make your idea stronger that - yes rxe is right choice. (hint, as a starting point please provide a fix to avoid crash in memory registration in rxe:-) ). And once you succeed in that, please also propose to Bernard to use rxe for siw driver to reuse resource allocation, resource (cq, qp, mr, srq) state machine, user ABI for sq, rq, cq handling, netlink etc, instead of creating new driver. This would support rxe is right choice for any software driver.