Re: Performance of 40-way guest running 2.6.32-220 (RHEL6.2) vs. 3.3.1 OS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/17/2012 6:25 AM, Chegu Vinod wrote:
On 4/17/2012 2:49 AM, Gleb Natapov wrote:
On Mon, Apr 16, 2012 at 07:44:39AM -0700, Chegu Vinod wrote:
On 4/16/2012 5:18 AM, Gleb Natapov wrote:
On Thu, Apr 12, 2012 at 02:21:06PM -0400, Rik van Riel wrote:
On 04/11/2012 01:21 PM, Chegu Vinod wrote:
Hello,

While running an AIM7 (workfile.high_systime) in a single 40-way (or a single 60-way KVM guest) I noticed pretty bad performance when the guest was booted with 3.3.1 kernel when compared to the same guest booted with 2.6.32-220
(RHEL6.2) kernel.
For the 40-way Guest-RunA (2.6.32-220 kernel) performed nearly 9x better than the Guest-RunB (3.3.1 kernel). In the case of 60-way guest run the older guest
kernel was nearly 12x better !
How many CPUs your host has?
80 Cores on the DL980.  (i.e. 8 Westmere sockets).

So you are not oversubscribing CPUs at all. Are those real cores or including HT?

HT is off.

Do you have other cpus hogs running on the host while testing the guest?

Nope. Sometimes I do run the utilities like "perf" or "sar" or "mpstat" on the numa node 0 (where
the guest is not running).


I was using numactl to bind the qemu of the 40-way guests to numa
nodes : 4-7  ( or for a 60-way guest
binding them to nodes 2-7)

/etc/qemu-ifup tap0

numactl --cpunodebind=4,5,6,7 --membind=4,5,6,7
/usr/local/bin/qemu-system-x86_64 -enable-kvm -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
-enable-kvm \
-m 65536 -smp 40 \
-name vm1 -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait
\
-drive file=/var/lib/libvirt/images/vmVinod1/vm1.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device virtio-blk-pci,scsi=off,bus=pci
.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-monitor stdio \
-net nic,macaddr=<..mac_addr..>  \
-net tap,ifname=tap0,script=no,downscript=no \
-vnc :4

/etc/qemu-ifdown tap0


I knew that there will be a few additional temporary qemu worker
threads created...  i.e. some over
subscription  will be there.

4 nodes above have 40 real cores, yes?

Yes .
Other than the qemu's related threads and some of the generic per-cpu Linux kernel threads (e.g. migration etc)
there isn't anything else running on these Numa nodes.

Can you try to run upstream
kernel without binding at all and check the performance?


Re-ran the same workload *without* binding the qemu...but using the 3.3.1 kernel

20-way guest: Performance got much worse when compared to the case where bind the qemu.
40-way guest: about the same as in the case  where we bind the qemu
60-way guest: about the same as in the case  where we bind the qemu

Trying out a couple of other experiments...

FYI
Vinod



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux