Re: VirtIO vs Emulation Netperf benchmark results

Marc Zyngier <marc.zyngier@xxxxxxx> · Thu, 11 Oct 2012 10:46:30 +0100

On 10/10/12 19:58, Alexander Graf wrote:
> 
> 
> On 10.10.2012, at 20:52, Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> 
>> On Wed, Oct 10, 2012 at 2:50 PM, Alexander Graf <agraf@xxxxxxx> wrote:
>>>
>>>
>>> On 10.10.2012, at 20:39, Alexander Spyridakis
>>> <a.spyridakis@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> For your information, with the latest developments related to VirtIO, I run
>>> netperf a couple of times to see the exact standing of network performance
>>> on the guests.
>>>
>>> The test was to run netperf -H "ip of LAN node", which tests TCP traffic for
>>> 10 seconds.
>>>
>>> x86 - x86:  ~96 Mbps - reference between two different computers
>>> ARM Host  - x86:  ~80 Mbps
>>> ARM Guest - x86:  ~ 2 Mbps - emulation
>>> ARM Guest - x86:  ~74 Mbps - VirtIO
>>>
>>> From these we conclude that:
>>>
>>> As expected x86 to x86 communication can reach the limit of the 100 Mbps
>>> LAN.
>>> The ARM board seems to not be capable of the LAN.
>>> Network emulation in QEMU is more than just slow (expected).
>>>
>>>
>>> Why is this expected? This performance drop is quite terrifying.
>>>
>>
>> I think he means expected as in, we already know we have this
>> terrifying problem. I'm looking into this right now, and I believe
>> Marc is also on this.
> 
> Ah, good :). Since you are on a dual-core machine with lots of traffic, you should get almost no vmexits for virtio queue processing.
> 
> Since we know that this is a fast case, the big difference to emulated devices are the exits. So I'd search there :).

There's a number of things we're aware of:

- The emulated device is pure PIO. Using this kind of device is always
going to suck, and even more on KVM. We could use a "less braindead"
model (some DMA capable device), but as we depart from the real VE
board, I'd rather go virtio all the way.

- Our exit path is painfully long. We could maybe make it more efficient
by being more lazy, and delay the switch of some of the state until we
get preempted (VFP state, for example). Not sure how much of an
improvement this would make, though.

- Our memory bandwidth is reduced by the number of TLB entries we waste
by not using section mappings instead of 4kB pages. Running hackbench on
a guest shows quite a slowdown that should mostly go away if/when
userspace switches to huge pages as backing store. I expect virtio to
suffer from the same problem.

Once we've addressed these points, I expect the IO performance to become
better. At least by some margin.

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm