Re: [EXPERIMENTAL v1 0/4] RDMA loopback device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 04, 2019 at 05:10:35PM +0000, Parav Pandit wrote:
> 
> 
> > -----Original Message-----
> > From: Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> > Sent: Monday, March 4, 2019 10:57 AM
> > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > Cc: Ira Weiny <ira.weiny@xxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>;
> > Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx>; bvanassche@xxxxxxx;
> > linux-rdma@xxxxxxxxxxxxxxx; Marcel Apfelbaum
> > <marcel.apfelbaum@xxxxxxxxx>; Kamal Heib <kheib@xxxxxxxxxx>
> > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device
> > 
> > On Mon, Mar 04, 2019 at 02:47:43PM +0000, Parav Pandit wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> > > > Sent: Monday, March 4, 2019 1:56 AM
> > > > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > > > Cc: Ira Weiny <ira.weiny@xxxxxxxxx>; Leon Romanovsky
> > > > <leon@xxxxxxxxxx>; Dennis Dalessandro
> > > > <dennis.dalessandro@xxxxxxxxx>; bvanassche@xxxxxxx;
> > > > linux-rdma@xxxxxxxxxxxxxxx
> > > > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device
> > > >
> > > > On Fri, Mar 01, 2019 at 06:27:34AM +0000, Parav Pandit wrote:
> > > > >
> > > > > > What is the real use case for this?
> > > > > Use case is fairly simple. Group of developers and users are
> > > > > running VMs
> > > > on laptop and in cloud without hw HCAs for devel purposes and
> > > > automated unit and Jenkins tests of their application.
> > > > > Such tests run for few hundreds of QPs and intent to exchange
> > > > > million
> > > > messages.
> > > > > This is mainly RoCE users.
> > > > > rxe is not fitting these needs with its current state.
> > > >
> > > > To run RDMA device in a VM even when no real HCA is installed on
> > > > host i can suggest QEMU's pvrdma device which is on its way to
> > > > become a real product.
> > > >
> > > Can you please share the github repo or other open source link which
> > provides the host side of emulation code?
> > 
> > Official QEMU repo is here:
> > https://github.com/qemu/qemu
> > 
> > > I am interested to know how does it work on Linux host without modifying
> > kernel, if it is close to production line.
> > 
> > On a host with no real HCA the backend device is RXE.
> > 
> Host and ibverb is not aware of the IP stack of the VM.
> 
> > > Linux host side code RoCE code doesn't support pvrdma's need. It requires
> > ABI changes... different discussion.
> > 
> > No, no ABI changes is needed, pvrdma is an ibverb client.
> > 
> > >
> > > And after doing all of it, such VM still requires enhanced host. This
> > approach doesn't have any of those limitations.
> > 
> > Can you elaborate on that? why enhanced host?
> > 
> Host is not aware of the IP stack of the VM.
> How do you resolve mac address for the destination IP used in guest VM in host, and how do you program right source mac and destination mac of the QP in host using ibverbs client?
> I was asking Aviad in Mellanox to use devx interface do have passthrough programming.
> So want to understand how are you doing this QEMU as its close to production now?

Not sure i fully understand your question, ex why host needs to know the
guest IP.
Anyway, the flow is like this:
- In guest, ib_core calls the driver's add_gid hook when gid entry is
  created
- Driver in guest passes binding info to qemu device (sgid and gid)
- qemu device adds this gid to host gid table (via netlink or QMP), note
  that sgid in host probably is different. Please also note that at this
  stage we have this gid defined twice in the fabric but since one is in
  guest, which is hidden, we are ok. (1098)
- When guest creates QP it passes the guest sgid to qemu device which
  replace it with the host sgid. (#436)

Since gid is defined in the host the all the routing is done as usual.

Full details of the above is available here
https://github.com/qemu/qemu/commit/2b05705dc8ad80c09a3aa9cc70c14fb8323b0fd3

Hope this answers your question.

> 
> > >
> > > > >
> > > > > Bart was looking to run on loopback (similar to other user request
> > > > > I
> > > > received) here [1].
> > > > >
> > > > > So I am looking at something insanely simple but extendible who
> > > > > can find
> > > > more uses as we go forward.
> > > >
> > > > Suggestion: To enhance 'loopback' performances, can you consider
> > > > using shared memory or any other IPC instead of going thought the
> > network stack?
> > > >
> > > Loopback driver in this patchset doesn't use network stack.
> > > It is just 2000 lines of wrapper to memcpy() to enables applications to use
> > rdma.
> > 
> > gr8!!
> > I had plans to patch RXE to support it but didn't found the time.
> > 
> > So actually why not do it in RXE?
> > 
> Can you please publish fio results with nfs-rdma, nvme-fabrics, rds and perftest using --infinite option by running it for one hour or so with rxe?
> It's been a while I did that. Last time when I tried with 5.0.0-rc5, perftest crashed the kernel on MR registration.

For now the device is limited to IBV_WR_SEND and IBV_WR_RECV opcodes so
anything with IBV_WR_RDMA_* is not yet supported.

But i ran ibv_rc_pingpong (which is also doing reg_mr) for lot more that an
hour with rxe as backend and had no issues. Can you share more details?



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux