On Mon, Apr 27, 2015 at 04:30:35PM +0200, Jan Kiszka wrote: > Am 2015-04-27 um 15:01 schrieb Stefan Hajnoczi: > > On Mon, Apr 27, 2015 at 1:55 PM, Jan Kiszka <jan.kiszka@xxxxxxxxxxx> wrote: > >> Am 2015-04-27 um 14:35 schrieb Jan Kiszka: > >>> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi: > >>>> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke@xxxxxxxx> wrote: > >>>>> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > >>>>>> > >>>>>> The motivation for making VM-to-VM fast is that while software > >>>>>> switches on the host are efficient today (thanks to vhost-user), there > >>>>>> is no efficient solution if the software switch is a VM. > >>>>> > >>>>> > >>>>> I see. This sounds like a noble goal indeed. I would love to run the > >>>>> software switch as just another VM in the long term. It would make it much > >>>>> easier for the various software switches to coexist in the world. > >>>>> > >>>>> The main technical risk I see in this proposal is that eliminating the > >>>>> memory copies might not have the desired effect. I might be tempted to keep > >>>>> the copies but prevent the kernel from having to inspect the vrings (more > >>>>> like vhost-user). But that is just a hunch and I suppose the first step > >>>>> would be a prototype to check the performance anyway. > >>>>> > >>>>> For what it is worth here is my view of networking performance on x86 in the > >>>>> Haswell+ era: > >>>>> https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow > >>>> > >>>> Thanks. > >>>> > >>>> I've been thinking about how to eliminate the VM <-> host <-> VM > >>>> switching and instead achieve just VM <-> VM. > >>>> > >>>> The holy grail of VM-to-VM networking is an exitless I/O path. In > >>>> other words, packets can be transferred between VMs without any > >>>> vmexits (this requires a polling driver). > >>>> > >>>> Here is how it works. QEMU gets "-device vhost-user" so that a VM can > >>>> act as the vhost-user server: > >>>> > >>>> VM1 (virtio-net guest driver) <-> VM2 (vhost-user device) > >>>> > >>>> VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device > >>>> and plays the host role instead of the normal virtio-net guest driver > >>>> role. > >>>> > >>>> The ugly thing about this is that VM2 needs to map all of VM1's guest > >>>> RAM so it can access the vrings and packet data. The solution to this > >>>> is something like the Shared Buffers BAR but this time it contains not > >>>> just the packet data but also the vring, let's call it the Shared > >>>> Virtqueues BAR. > >>>> > >>>> The Shared Virtqueues BAR eliminates the need for vhost-net on the > >>>> host because VM1 and VM2 communicate directly using virtqueue notify > >>>> or polling vring memory. Virtqueue notify works by connecting an > >>>> eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have > >>>> an ioeventfd that is irqfd for VM1 to signal completions. > >>> > >>> We had such a discussion before: > >>> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658 > >>> > >>> Would be great to get this ball rolling again. > >>> > >>> Jan > >>> > >> > >> But one challenge would remain even then (unless both sides only poll): > >> exit-free inter-VM signaling, no? But that's a hardware issue first of all. > > > > To start with ioeventfd<->irqfd can be used. It incurs a light-weight > > exit in VM1 and interrupt injection in VM2. > > > > For networking the cost is mitigated by NAPI drivers which switch > > between interrupts and polling. During notification-heavy periods the > > guests would use polling anyway. > > > > A hardware solution would be some kind of inter-guest interrupt > > injection. I don't know VMX well enough to know whether that is > > possible on Intel CPUs. > > Today, we have posted interrupts to avoid the vm-exit on the target CPU, > but there is nothing yet (to my best knowledge) to avoid the exit on the > sender side (unless we ignore security). That's the same problem with > intra-guest IPIs, BTW. > > For throughput and given NAPI patterns, that's probably not an issue as > you noted. It may be for latency, though, when almost every cycle counts. > > Jan If you are counting cycles you likely can't afford the interrupt latency under linux, so you have to poll memory. > -- > Siemens AG, Corporate Technology, CT RTC ITP SES-DE > Corporate Competence Center Embedded Linux _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization