Re: Network performance with small packets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 25, 2011 at 03:09:34PM -0600, Steve Dobbelstein wrote:
> 
> I am working on a KVM network performance issue found in our lab running
> the DayTrader benchmark.  The benchmark throughput takes a significant hit
> when running the application server in a KVM guest verses on bare metal.
> We have dug into the problem and found that DayTrader's use of small
> packets exposes KVM's overhead of handling network packets.  I have been
> able to reproduce the performance hit with a simpler setup using the
> netperf benchmark with the TCP_RR test and the request and response sizes
> set to 256 bytes.  I run the benchmark between two physical systems, each
> using a 1GB link.  In order to get the maximum throughput for the system I
> have to run 100 instances of netperf.  When I run the netserver processes
> in a guest, I see a maximum throughput that is 51% of what I get if I run
> the netserver processes directly on the host.  The CPU utilization in the
> guest is only 85% at maximum throughput, whereas it is 100% on bare metal.
> 
> The KVM host has 16 CPUs.  The KVM guest is configured with 2 VCPUs.  When
> I run netperf on the host I boot the host with maxcpus=2 on the kernel
> command line.  The host is running the current KVM upstream kernel along
> with the current upstream qemu.  Here is the qemu command used to launch
> the guest:
> /build/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 -name glasgow-RH60 -m 32768 -drive file=/build/guest-data/glasgow-RH60.img,if=virtio,index=0,boot=on
>  -drive file=/dev/virt/WAS,if=virtio,index=1 -net nic,model=virtio,vlan=3,macaddr=00:1A:64:E5:00:63,netdev=nic0 -netdev tap,id=nic0,vhost=on -smp 2
> -vnc :1 -monitor telnet::4499,server,nowait -serial telnet::8899,server,nowait --mem-path /libhugetlbfs -daemonize
> 
> We have tried various proposed fixes, each with varying amounts of success.
> One such fix was to add code to the vhost thread such that when it found
> the work queue empty it wouldn't just exit the thread but rather would
> delay for 50 microseconds and then recheck the queue.  If there was work on
> the queue it would loop back and process it, else it would exit the thread.
> The change got us a 13% improvement in the DayTrader throughput.
> 
> Running the same netperf configuration on the same hardware but using a
> different hypervisor gets us significantly better throughput numbers.   The
> guest on that hypervisor runs at 100% CPU utilization.  The various fixes
> we have tried have not gotten us close to the throughput seen on the other
> hypervisor.  I'm looking for ideas/input from the KVM experts on how to
> make KVM perform better when handling small packets.
> 
> Thanks,
> Steve

I am seeing a similar problem, and am trying to fix that.
My current theory is that this is a variant of a receive livelock:
if the application isn't fast enough to process
incoming data, the guest net stack switches
from prequeue to backlog handling.

One thing I noticed is that locking the vhost thread
and the vcpu to the same physical CPU almost doubles the
bandwidth.  Can you confirm that in your setup?

My current guess is that when we lock both to
a single CPU, netperf in guest gets scheduled
slowing down the vhost thread in the host.

I also noticed that this specific workload
performs better with vhost off: presumably
we are loading the guest less.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux