On Tue, Oct 06, 2015 at 02:42:34PM -0700, Nakajima, Jun wrote: > Hi Michael, > > Looks like the discussions tapered off, but do you have a plan to > implement this if people are eventually fine with it? We want to > extend this to support multiple VMs. Absolutely. We are just back from holidays, and started looking at who does what. If anyone wants to help, that'd also be nice. > On Mon, Aug 31, 2015 at 11:35 AM, Nakajima, Jun <jun.nakajima@xxxxxxxxx> wrote: > > On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > >> Hello! > >> During the KVM forum, we discussed supporting virtio on top > >> of ivshmem. I have considered it, and came up with an alternative > >> that has several advantages over that - please see below. > >> Comments welcome. > > > > Hi Michael, > > > > I like this, and it should be able to achieve what I presented at KVM > > Forum (vhost-user-shmem). > > Comments below. > > > >> > >> ----- > >> > >> Existing solutions to userspace switching between VMs on the > >> same host are vhost-user and ivshmem. > >> > >> vhost-user works by mapping memory of all VMs being bridged into the > >> switch memory space. > >> > >> By comparison, ivshmem works by exposing a shared region of memory to all VMs. > >> VMs are required to use this region to store packets. The switch only > >> needs access to this region. > >> > >> Another difference between vhost-user and ivshmem surfaces when polling > >> is used. With vhost-user, the switch is required to handle > >> data movement between VMs, if using polling, this means that 1 host CPU > >> needs to be sacrificed for this task. > >> > >> This is easiest to understand when one of the VMs is > >> used with VF pass-through. This can be schematically shown below: > >> > >> +-- VM1 --------------+ +---VM2-----------+ > >> | virtio-pci +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- NIC > >> +---------------------+ +-----------------+ > >> > >> > >> With ivshmem in theory communication can happen directly, with two VMs > >> polling the shared memory region. > >> > >> > >> I won't spend time listing advantages of vhost-user over ivshmem. > >> Instead, having identified two advantages of ivshmem over vhost-user, > >> below is a proposal to extend vhost-user to gain the advantages > >> of ivshmem. > >> > >> > >> 1: virtio in guest can be extended to allow support > >> for IOMMUs. This provides guest with full flexibility > >> about memory which is readable or write able by each device. > > > > I assume that you meant VFIO only for virtio by "use of VFIO". To get > > VFIO working for general direct-I/O (including VFs) in guests, as you > > know, we need to virtualize IOMMU (e.g. VT-d) and the interrupt > > remapping table on x86 (i.e. nested VT-d). > > > >> By setting up a virtio device for each other VM we need to > >> communicate to, guest gets full control of its security, from > >> mapping all memory (like with current vhost-user) to only > >> mapping buffers used for networking (like ivshmem) to > >> transient mappings for the duration of data transfer only. > > > > And I think that we can use VMFUNC to have such transient mappings. > > > >> This also allows use of VFIO within guests, for improved > >> security. > >> > >> vhost user would need to be extended to send the > >> mappings programmed by guest IOMMU. > > > > Right. We need to think about cases where other VMs (VM3, etc.) join > > the group or some existing VM leaves. > > PCI hot-plug should work there (as you point out at "Advantages over > > ivshmem" below). > > > >> > >> 2. qemu can be extended to serve as a vhost-user client: > >> remote VM mappings over the vhost-user protocol, and > >> map them into another VM's memory. > >> This mapping can take, for example, the form of > >> a BAR of a pci device, which I'll call here vhost-pci - > >> with bus address allowed > >> by VM1's IOMMU mappings being translated into > >> offsets within this BAR within VM2's physical > >> memory space. > > > > I think it's sensible. > > > >> > >> Since the translation can be a simple one, VM2 > >> can perform it within its vhost-pci device driver. > >> > >> While this setup would be the most useful with polling, > >> VM1's ioeventfd can also be mapped to > >> another VM2's irqfd, and vice versa, such that VMs > >> can trigger interrupts to each other without need > >> for a helper thread on the host. > >> > >> > >> The resulting channel might look something like the following: > >> > >> +-- VM1 --------------+ +---VM2-----------+ > >> | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC > >> +---------------------+ +-----------------+ > >> > >> comparing the two diagrams, a vhost-user thread on the host is > >> no longer required, reducing the host CPU utilization when > >> polling is active. At the same time, VM2 can not access all of VM1's > >> memory - it is limited by the iommu configuration setup by VM1. > >> > >> > >> Advantages over ivshmem: > >> > >> - more flexibility, endpoint VMs do not have to place data at any > >> specific locations to use the device, in practice this likely > >> means less data copies. > >> - better standardization/code reuse > >> virtio changes within guests would be fairly easy to implement > >> and would also benefit other backends, besides vhost-user > >> standard hotplug interfaces can be used to add and remove these > >> channels as VMs are added or removed. > >> - migration support > >> It's easy to implement since ownership of memory is well defined. > >> For example, during migration VM2 can notify hypervisor of VM1 > >> by updating dirty bitmap each time is writes into VM1 memory. > > > > Also, the ivshmem functionality could be implemented by this proposal: > > - vswitch (or some VM) allocates memory regions in its address space, and > > - it sets up that IOMMU mappings on the VMs be translated into the regions > > > >> > >> Thanks, > >> > >> -- > >> MST > > > > -- > Jun > Intel Open Source Technology Center _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization