Re: Performance of 40-way guest running 2.6.32-220 (RHEL6.2) vs. 3.3.1 OS

Chegu Vinod <chegu_vinod@xxxxxx> · Tue, 17 Apr 2012 06:25:15 -0700

On 4/17/2012 2:49 AM, Gleb Natapov wrote:
On Mon, Apr 16, 2012 at 07:44:39AM -0700, Chegu Vinod wrote:
On 4/16/2012 5:18 AM, Gleb Natapov wrote:
On Thu, Apr 12, 2012 at 02:21:06PM -0400, Rik van Riel wrote:
On 04/11/2012 01:21 PM, Chegu Vinod wrote:
Hello,

While running an AIM7 (workfile.high_systime) in a single 40-way (or a single
60-way KVM guest) I noticed pretty bad performance when the guest was booted
with 3.3.1 kernel when compared to the same guest booted with 2.6.32-220
(RHEL6.2) kernel.
For the 40-way Guest-RunA (2.6.32-220 kernel) performed nearly 9x better than
the Guest-RunB (3.3.1 kernel). In the case of 60-way guest run the older guest
kernel was nearly 12x better !
How many CPUs your host has?
80 Cores on the DL980.  (i.e. 8 Westmere sockets).

So you are not oversubscribing CPUs at all. Are those real cores or including HT?

HT is off.

Do you have other cpus hogs running on the host while testing the guest?

Nope.  Sometimes I do run the utilities like "perf" or "sar" or "mpstat" 
on the numa node 0 (where
the guest is not running).

I was using numactl to bind the qemu of the 40-way guests to numa
nodes : 4-7  ( or for a 60-way guest
binding them to nodes 2-7)

/etc/qemu-ifup tap0

numactl --cpunodebind=4,5,6,7 --membind=4,5,6,7
/usr/local/bin/qemu-system-x86_64 -enable-kvm -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
-enable-kvm \
-m 65536 -smp 40 \
-name vm1 -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait
\
-drive file=/var/lib/libvirt/images/vmVinod1/vm1.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device virtio-blk-pci,scsi=off,bus=pci
.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-monitor stdio \
-net nic,macaddr=<..mac_addr..>  \
-net tap,ifname=tap0,script=no,downscript=no \
-vnc :4

/etc/qemu-ifdown tap0

I knew that there will be a few additional temporary qemu worker
threads created...  i.e. some over
subscription  will be there.

4 nodes above have 40 real cores, yes?

Yes .
Other than the qemu's related threads and some of the generic per-cpu 
Linux kernel threads (e.g. migration  etc)
there isn't anything else running on these Numa nodes.

Can you try to run upstream
kernel without binding at all and check the performance?

I shall re-run and get back to you with this info.

Typically for the native runs... binding the workload results in better 
numbers.  Hence I choose to do the
binding for the guest too...i.e. on the same numa nodes as the native 
case for virt. vs. native comparison
purposes. Having said that ...In the past I had seen a couple of cases 
where the non-binded guest
performed better than the native case. Need to re-run and dig into this 
further...

Will have to retry by doing some explicit pinning of the vcpus to
native cores (without using virsh).

Turned on function tracing and found that there appears to be more time being
spent around the lock code in the 3.3.1 guest when compared to the 2.6.32-220
guest.
Looks like you may be running into the ticket spinlock
code. During the early RHEL 6 days, Gleb came up with a
patch to automatically disable ticket spinlocks when
running inside a KVM guest.

IIRC that patch got rejected upstream at the time,
with upstream developers preferring to wait for a
"better solution".

If such a better solution is not on its way upstream
now (two years later), maybe we should just merge
Gleb's patch upstream for the time being?
I think the pv spinlock that is actively discussed currently should
address the issue, but I am not sure someone tests it against non-ticket
lock in a guest to see which one performs better.
I did see that discussion...seems to have originated from the Xen context.

Yes, The problem is the same for both hypervisors.

--
			Gleb.

Thanks
Vinod

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html