Re: VirtIO vs Emulation Netperf benchmark results

"Sundaram, Senthilkumar" <ssundara@xxxxxxxxxxxxxxxx> · Fri, 9 Nov 2012 08:59:02 +0000

> -----Original Message-----
> From: kvmarm-bounces@xxxxxxxxxxxxxxxxxxxxx [mailto:kvmarm-
> bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Christoffer Dall
> Sent: Thursday, October 11, 2012 8:14 PM
> To: Antonios Motakis
> Cc: kvmarm@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  VirtIO vs Emulation Netperf benchmark results
> 
> On Thu, Oct 11, 2012 at 6:12 AM, Antonios Motakis
> <a.motakis@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > Sorry for a repost, pressed reply instead of reply to all.
> >
> > On Thu, Oct 11, 2012 at 11:55 AM, Alexander Graf <agraf@xxxxxxx> wrote:
> >>
> >>
> >>
> >> On 11.10.2012, at 11:46, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
> >>
> >> > On 10/10/12 19:58, Alexander Graf wrote:
> >> >>
> >> >>
> >> >> On 10.10.2012, at 20:52, Christoffer Dall
> >> >> <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >> >>
> >> >>> On Wed, Oct 10, 2012 at 2:50 PM, Alexander Graf <agraf@xxxxxxx>
> wrote:
> >> >>>>
> >> >>>>
> >> >>>> On 10.10.2012, at 20:39, Alexander Spyridakis
> >> >>>> <a.spyridakis@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >> >>>>
> >> >>>> For your information, with the latest developments related to
> >> >>>> VirtIO, I run netperf a couple of times to see the exact
> >> >>>> standing of network performance on the guests.
> >> >>>>
> >> >>>> The test was to run netperf -H "ip of LAN node", which tests TCP
> >> >>>> traffic for
> >> >>>> 10 seconds.
> >> >>>>
> >> >>>> x86 - x86:  ~96 Mbps - reference between two different computers
> >> >>>> ARM Host  - x86:  ~80 Mbps ARM Guest - x86:  ~ 2 Mbps -
> >> >>>> emulation ARM Guest - x86:  ~74 Mbps - VirtIO
> >> >>>>
> >> >>>> From these we conclude that:
> >> >>>>
> >> >>>> As expected x86 to x86 communication can reach the limit of the
> >> >>>> 100 Mbps LAN.
> >> >>>> The ARM board seems to not be capable of the LAN.
> >> >>>> Network emulation in QEMU is more than just slow (expected).
> >> >>>>
> >> >>>>
> >> >>>> Why is this expected? This performance drop is quite terrifying.
> >> >>>>
> >> >>>
> >> >>> I think he means expected as in, we already know we have this
> >> >>> terrifying problem. I'm looking into this right now, and I
> >> >>> believe Marc is also on this.
> >> >>
> >> >> Ah, good :). Since you are on a dual-core machine with lots of
> >> >> traffic, you should get almost no vmexits for virtio queue processing.
> >> >>
> >> >> Since we know that this is a fast case, the big difference to
> >> >> emulated devices are the exits. So I'd search there :).
> >> >
> >> > There's a number of things we're aware of:
> >> >
> >> > - The emulated device is pure PIO. Using this kind of device is
> >> > always going to suck, and even more on KVM. We could use a "less
> braindead"
> >> > model (some DMA capable device), but as we depart from the real VE
> >> > board, I'd rather go virtio all the way.
> >>
> >> Well, you should try to get comparable performance numbers. If that
> >> means exposing that braindead device on an x86 vm and turning off
> >> coalesced mmio, so be it.
> >>
> >> The alternative is to expose PCI into the guest, even when it's only
> >> half-working. It's not meant for production, but to get performance
> >> comparison data that you can sanity check against x86 to see if (and
> >> what) you're doing wrong.
> >>
> >> >
> >> > - Our exit path is painfully long. We could maybe make it more
> >> > efficient by being more lazy, and delay the switch of some of the
> >> > state until we get preempted (VFP state, for example). Not sure how
> >> > much of an improvement this would make, though.
> >>
> >> Lazy FP switch bought me quize a significant speedup on ppc. It won't
> >> help you here though. User space exits need to restore that state
> regardless.
> >> Unless the guest hasn't used FP. Then you can save yourself both ways
> >> of FP state switches.
> >>
> >
> > VFP switches are already being done lazily however, only when the
> > guest actually uses some FP or Advanced SIMD instructions, and not on
> > entry. In fact, when we lazy switch the VFP registers, we return
> > directly from Hypermode interrupt context to the guest, without really
> > giving the chance to the host to do much. We do not go all the way back to
> the ioctl loop.
> >
> > However, on the next vm exit we will switch back to the host state
> > regardless of whether the host is going to use VFP or not, but I don't
> > think optimizing that would offer any big benefits, especially for I/O.
> >
> > Of course things could always be improved, for example we could try
> > handing the VFP/NEON control registers separately and emulate them,
> > instead of doing a complete switch every time the guest does something
> > simple, e.g. only to check whether VFP is enabled. But we would need
> > some numbers to know whether this will do things better or worse, since
> this implies another exit.
> >
> > Best regards,
> > Antonios
> >
> >>
> >> >
> >> > - Our memory bandwidth is reduced by the number of TLB entries we
> >> > waste by not using section mappings instead of 4kB pages. Running
> >> > hackbench on a guest shows quite a slowdown that should mostly go
> >> > away if/when userspace switches to huge pages as backing store. I
> >> > expect virtio to suffer from the same problem.
> >>
> >> That one should be in a completely different ballpark. I'd be very
> >> surprised if you get more than 10% slowdowns in TLB miss intensive
> >> workloads. Definitely not as low of a hanging fruit as we see here.
> >>
> >> Alex
> >>
> >> >
> >> > Once we've addressed these points, I expect the IO performance to
> >> > become better. At least by some margin.
> >> >
> 
> I ran perf a bit yesterday and it seems we spend approx. 5% of the vcpu
> thread's time on vgic save/restore.  I don't know if this can be optimized at all
> though.
> 
[[ss]] Hi Chris,

Can you tell me how you ran perf to get this level of details? When I ran it I only get a very high level summary, that is not very useful.

Thanks
Senthil

> -Christoffer
> _______________________________________________
> kvmarm mailing list
> kvmarm@xxxxxxxxxxxxxxxxxxxxx
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm