Re: rfc: vhost user enhancements for vm2vm communication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 06, 2015 at 02:42:34PM -0700, Nakajima, Jun wrote:
> Hi Michael,
> 
> Looks like the discussions tapered off, but do you have a plan to
> implement this if people are eventually fine with it? We want to
> extend this to support multiple VMs.

Absolutely. We are just back from holidays, and started looking at who
does what. If anyone wants to help, that'd also be nice.


> On Mon, Aug 31, 2015 at 11:35 AM, Nakajima, Jun <jun.nakajima@xxxxxxxxx> wrote:
> > On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> >> Hello!
> >> During the KVM forum, we discussed supporting virtio on top
> >> of ivshmem. I have considered it, and came up with an alternative
> >> that has several advantages over that - please see below.
> >> Comments welcome.
> >
> > Hi Michael,
> >
> > I like this, and it should be able to achieve what I presented at KVM
> > Forum (vhost-user-shmem).
> > Comments below.
> >
> >>
> >> -----
> >>
> >> Existing solutions to userspace switching between VMs on the
> >> same host are vhost-user and ivshmem.
> >>
> >> vhost-user works by mapping memory of all VMs being bridged into the
> >> switch memory space.
> >>
> >> By comparison, ivshmem works by exposing a shared region of memory to all VMs.
> >> VMs are required to use this region to store packets. The switch only
> >> needs access to this region.
> >>
> >> Another difference between vhost-user and ivshmem surfaces when polling
> >> is used. With vhost-user, the switch is required to handle
> >> data movement between VMs, if using polling, this means that 1 host CPU
> >> needs to be sacrificed for this task.
> >>
> >> This is easiest to understand when one of the VMs is
> >> used with VF pass-through. This can be schematically shown below:
> >>
> >> +-- VM1 --------------+            +---VM2-----------+
> >> | virtio-pci          +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- NIC
> >> +---------------------+            +-----------------+
> >>
> >>
> >> With ivshmem in theory communication can happen directly, with two VMs
> >> polling the shared memory region.
> >>
> >>
> >> I won't spend time listing advantages of vhost-user over ivshmem.
> >> Instead, having identified two advantages of ivshmem over vhost-user,
> >> below is a proposal to extend vhost-user to gain the advantages
> >> of ivshmem.
> >>
> >>
> >> 1: virtio in guest can be extended to allow support
> >> for IOMMUs. This provides guest with full flexibility
> >> about memory which is readable or write able by each device.
> >
> > I assume that you meant VFIO only for virtio by "use of VFIO".  To get
> > VFIO working for general direct-I/O (including VFs) in guests, as you
> > know, we need to virtualize IOMMU (e.g. VT-d) and the interrupt
> > remapping table on x86 (i.e. nested VT-d).
> >
> >> By setting up a virtio device for each other VM we need to
> >> communicate to, guest gets full control of its security, from
> >> mapping all memory (like with current vhost-user) to only
> >> mapping buffers used for networking (like ivshmem) to
> >> transient mappings for the duration of data transfer only.
> >
> > And I think that we can use VMFUNC to have such transient mappings.
> >
> >> This also allows use of VFIO within guests, for improved
> >> security.
> >>
> >> vhost user would need to be extended to send the
> >> mappings programmed by guest IOMMU.
> >
> > Right. We need to think about cases where other VMs (VM3, etc.) join
> > the group or some existing VM leaves.
> > PCI hot-plug should work there (as you point out at "Advantages over
> > ivshmem" below).
> >
> >>
> >> 2. qemu can be extended to serve as a vhost-user client:
> >> remote VM mappings over the vhost-user protocol, and
> >> map them into another VM's memory.
> >> This mapping can take, for example, the form of
> >> a BAR of a pci device, which I'll call here vhost-pci -
> >> with bus address allowed
> >> by VM1's IOMMU mappings being translated into
> >> offsets within this BAR within VM2's physical
> >> memory space.
> >
> > I think it's sensible.
> >
> >>
> >> Since the translation can be a simple one, VM2
> >> can perform it within its vhost-pci device driver.
> >>
> >> While this setup would be the most useful with polling,
> >> VM1's ioeventfd can also be mapped to
> >> another VM2's irqfd, and vice versa, such that VMs
> >> can trigger interrupts to each other without need
> >> for a helper thread on the host.
> >>
> >>
> >> The resulting channel might look something like the following:
> >>
> >> +-- VM1 --------------+  +---VM2-----------+
> >> | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC
> >> +---------------------+  +-----------------+
> >>
> >> comparing the two diagrams, a vhost-user thread on the host is
> >> no longer required, reducing the host CPU utilization when
> >> polling is active.  At the same time, VM2 can not access all of VM1's
> >> memory - it is limited by the iommu configuration setup by VM1.
> >>
> >>
> >> Advantages over ivshmem:
> >>
> >> - more flexibility, endpoint VMs do not have to place data at any
> >>   specific locations to use the device, in practice this likely
> >>   means less data copies.
> >> - better standardization/code reuse
> >>   virtio changes within guests would be fairly easy to implement
> >>   and would also benefit other backends, besides vhost-user
> >>   standard hotplug interfaces can be used to add and remove these
> >>   channels as VMs are added or removed.
> >> - migration support
> >>   It's easy to implement since ownership of memory is well defined.
> >>   For example, during migration VM2 can notify hypervisor of VM1
> >>   by updating dirty bitmap each time is writes into VM1 memory.
> >
> > Also, the ivshmem functionality could be implemented by this proposal:
> > - vswitch (or some VM) allocates memory regions in its address space, and
> > - it sets up that IOMMU mappings on the VMs be translated into the regions
> >
> >>
> >> Thanks,
> >>
> >> --
> >> MST
> 
> 
> 
> -- 
> Jun
> Intel Open Source Technology Center
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux