On 4/17/2012 6:25 AM, Chegu Vinod wrote:
On 4/17/2012 2:49 AM, Gleb Natapov wrote:
On Mon, Apr 16, 2012 at 07:44:39AM -0700, Chegu Vinod wrote:
On 4/16/2012 5:18 AM, Gleb Natapov wrote:
On Thu, Apr 12, 2012 at 02:21:06PM -0400, Rik van Riel wrote:
On 04/11/2012 01:21 PM, Chegu Vinod wrote:
Hello,
While running an AIM7 (workfile.high_systime) in a single 40-way
(or a single
60-way KVM guest) I noticed pretty bad performance when the guest
was booted
with 3.3.1 kernel when compared to the same guest booted with
2.6.32-220
(RHEL6.2) kernel.
For the 40-way Guest-RunA (2.6.32-220 kernel) performed nearly 9x
better than
the Guest-RunB (3.3.1 kernel). In the case of 60-way guest run
the older guest
kernel was nearly 12x better !
How many CPUs your host has?
80 Cores on the DL980. (i.e. 8 Westmere sockets).
So you are not oversubscribing CPUs at all. Are those real cores or
including HT?
HT is off.
Do you have other cpus hogs running on the host while testing the guest?
Nope. Sometimes I do run the utilities like "perf" or "sar" or
"mpstat" on the numa node 0 (where
the guest is not running).
I was using numactl to bind the qemu of the 40-way guests to numa
nodes : 4-7 ( or for a 60-way guest
binding them to nodes 2-7)
/etc/qemu-ifup tap0
numactl --cpunodebind=4,5,6,7 --membind=4,5,6,7
/usr/local/bin/qemu-system-x86_64 -enable-kvm -cpu
Westmere,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
-enable-kvm \
-m 65536 -smp 40 \
-name vm1 -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait
\
-drive
file=/var/lib/libvirt/images/vmVinod1/vm1.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device virtio-blk-pci,scsi=off,bus=pci
.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-monitor stdio \
-net nic,macaddr=<..mac_addr..> \
-net tap,ifname=tap0,script=no,downscript=no \
-vnc :4
/etc/qemu-ifdown tap0
I knew that there will be a few additional temporary qemu worker
threads created... i.e. some over
subscription will be there.
4 nodes above have 40 real cores, yes?
Yes .
Other than the qemu's related threads and some of the generic per-cpu
Linux kernel threads (e.g. migration etc)
there isn't anything else running on these Numa nodes.
Can you try to run upstream
kernel without binding at all and check the performance?
Re-ran the same workload *without* binding the qemu...but using the
3.3.1 kernel
20-way guest: Performance got much worse when compared to the case where
bind the qemu.
40-way guest: about the same as in the case where we bind the qemu
60-way guest: about the same as in the case where we bind the qemu
Trying out a couple of other experiments...
FYI
Vinod
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html