Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 27, 2015 at 04:30:35PM +0200, Jan Kiszka wrote:
> Am 2015-04-27 um 15:01 schrieb Stefan Hajnoczi:
> > On Mon, Apr 27, 2015 at 1:55 PM, Jan Kiszka <jan.kiszka@xxxxxxxxxxx> wrote:
> >> Am 2015-04-27 um 14:35 schrieb Jan Kiszka:
> >>> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:
> >>>> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke@xxxxxxxx> wrote:
> >>>>> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>> The motivation for making VM-to-VM fast is that while software
> >>>>>> switches on the host are efficient today (thanks to vhost-user), there
> >>>>>> is no efficient solution if the software switch is a VM.
> >>>>>
> >>>>>
> >>>>> I see. This sounds like a noble goal indeed. I would love to run the
> >>>>> software switch as just another VM in the long term. It would make it much
> >>>>> easier for the various software switches to coexist in the world.
> >>>>>
> >>>>> The main technical risk I see in this proposal is that eliminating the
> >>>>> memory copies might not have the desired effect. I might be tempted to keep
> >>>>> the copies but prevent the kernel from having to inspect the vrings (more
> >>>>> like vhost-user). But that is just a hunch and I suppose the first step
> >>>>> would be a prototype to check the performance anyway.
> >>>>>
> >>>>> For what it is worth here is my view of networking performance on x86 in the
> >>>>> Haswell+ era:
> >>>>> https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow
> >>>>
> >>>> Thanks.
> >>>>
> >>>> I've been thinking about how to eliminate the VM <-> host <-> VM
> >>>> switching and instead achieve just VM <-> VM.
> >>>>
> >>>> The holy grail of VM-to-VM networking is an exitless I/O path.  In
> >>>> other words, packets can be transferred between VMs without any
> >>>> vmexits (this requires a polling driver).
> >>>>
> >>>> Here is how it works.  QEMU gets "-device vhost-user" so that a VM can
> >>>> act as the vhost-user server:
> >>>>
> >>>> VM1 (virtio-net guest driver) <-> VM2 (vhost-user device)
> >>>>
> >>>> VM1 has a regular virtio-net PCI device.  VM2 has a vhost-user device
> >>>> and plays the host role instead of the normal virtio-net guest driver
> >>>> role.
> >>>>
> >>>> The ugly thing about this is that VM2 needs to map all of VM1's guest
> >>>> RAM so it can access the vrings and packet data.  The solution to this
> >>>> is something like the Shared Buffers BAR but this time it contains not
> >>>> just the packet data but also the vring, let's call it the Shared
> >>>> Virtqueues BAR.
> >>>>
> >>>> The Shared Virtqueues BAR eliminates the need for vhost-net on the
> >>>> host because VM1 and VM2 communicate directly using virtqueue notify
> >>>> or polling vring memory.  Virtqueue notify works by connecting an
> >>>> eventfd as ioeventfd in VM1 and irqfd in VM2.  And VM2 would also have
> >>>> an ioeventfd that is irqfd for VM1 to signal completions.
> >>>
> >>> We had such a discussion before:
> >>> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658
> >>>
> >>> Would be great to get this ball rolling again.
> >>>
> >>> Jan
> >>>
> >>
> >> But one challenge would remain even then (unless both sides only poll):
> >> exit-free inter-VM signaling, no? But that's a hardware issue first of all.
> > 
> > To start with ioeventfd<->irqfd can be used.  It incurs a light-weight
> > exit in VM1 and interrupt injection in VM2.
> > 
> > For networking the cost is mitigated by NAPI drivers which switch
> > between interrupts and polling.  During notification-heavy periods the
> > guests would use polling anyway.
> > 
> > A hardware solution would be some kind of inter-guest interrupt
> > injection.  I don't know VMX well enough to know whether that is
> > possible on Intel CPUs.
> 
> Today, we have posted interrupts to avoid the vm-exit on the target CPU,
> but there is nothing yet (to my best knowledge) to avoid the exit on the
> sender side (unless we ignore security). That's the same problem with
> intra-guest IPIs, BTW.
> 
> For throughput and given NAPI patterns, that's probably not an issue as
> you noted. It may be for latency, though, when almost every cycle counts.
> 
> Jan

If you are counting cycles you likely can't afford the
interrupt latency under linux, so you have to poll
memory.

> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux