Performance issue

George-Cristian Bîrzan <gc@xxxxxxxxxx> · Thu, 22 Nov 2012 21:17:34 +0200

I'm trying to understand a performance problem (50% degradation in the
VM) that I'm experiencing some systems with qemu-kvm. Running Fedora
with 3.5.3-1.fc17.x86_64 or 3.6.6-1.fc17.x86_64, qemu 1.0.1 or 1.2.1
on AMD Opteron 6176 and 6174, and all of them behave identically.

A Windows guest is receiving a UDP MPEG stream that is being processed
by TSReader. The stream comes in at about 73Mbps, but the VM cannot
process more than 43Mbps. It's not a networking issue, the packets
reach the guest and with iperf we can easily do 80Mbps. Also, with
iperf, it can receive the packets from the streamer (even though it
doesn't detect things properly, but it was just a way to see ).

However, on an identical host (a 6174 CPU, even), a Windows install
has absolutely no problem processing the same stream.

This is the command we're using to start qemu-kvm:

/usr/bin/qemu-kvm -name b691546e-79f8-49c6-a293-81067503a6ad -S -M
pc-1.2 -cpu host -enable-kvm -m 16384 -smp
16,sockets=1,cores=16,threads=1 -uuid
b691546e-79f8-49c6-a293-81067503a6ad -no-user-config -nodefaults
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/b691546e-79f8-49c6-a293-81067503a6ad.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-drive file=/var/lib/libvirt/images/dis-magnetics-2-223101/d8b233c6-8424-4de9-ae3c-7c9a60288514,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback,aio=native
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2e:fb:a2:36:be,bus=pci.0,addr=0x3
-netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=22:94:44:5a:cb:24,bus=pci.0,addr=0x4
-vnc 127.0.0.1:4,password -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

As a sidenote, the TSReader application only uses one thread for
decoding the stream, one for network IO. While using more threads
would solve the problem.

I've tried smaller guest, with 5 cores, pinned all of them to CPUs 6
to 11 (all in a NUMA node), each to an individual CPU, I've tried
enabling huge pages/TLB thingy... and that's about it. I'm completely
stuck.

Is this 50% hit something that's considered 'okay', or am I doing
something wrong? And if the latter, what/how can I debug it?

--
George-Cristian Bîrzan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html