KVM lock contention on 48 core AMD machine

Ben Nagy <ben@xxxxxxxx> · Fri, 18 Mar 2011 19:02:40 +0700

Hi,

We've been trying to debug a problem when bring up VMs on a 48 core
AMD machine (4 x Opteron 6128). After some investigation and some
helpful comments from #kvm, it appears that we hit a serious lock
contention issue at a certain point. We have enabled lockdep debugging
(had to increase MAX_LOCK_DEPTH in sched.h to 144!) and have some
output, but I'm not all that sure how to progress from here in
troubleshooting the issue.

Linux eax 2.6.38-7-vmhost #35 SMP Thu Mar 17 13:25:10 SGT 2011 x86_64
x86_64 x86_64 GNU/Linux
(vmhost is a custom flavour which has the lock debugging stuff
enabled, the base distro is Ubuntu Natty alpha3)

QEMU emulator version 0.14.0 (qemu-kvm-0.14.0), Copyright (c)
2003-2008 Fabrice Bellard

CPUs - 4 physical 12 core CPUs.
model name      : AMD Opteron(tm) Processor 6168
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm
cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter

RAM: 96GB

KVM commandline (using libvirt):
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
QEMU_AUDIO_DRV=none /usr/local/bin/kvm-snapshot -S -M pc-0.14
-enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name fb-0
-uuid de59229b-eb06-9ecc-758e-d20bc5ddc291 -nodefconfig -nodefaults
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/fb-0.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=localtime
-no-acpi -boot cd -drive
file=/mnt/big/bigfiles/kvm_disks/eax/fb-0.ovl,if=none,id=drive-ide0-0-0,format=qcow2
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
-drive if=none,media=cdrom,id=drive-ide0-0-1,readonly=on,format=raw
-device ide-drive,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1
-netdev tap,fd=17,id=hostnet0 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d9:09:ef,bus=pci.0,addr=0x3
-usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -k en-us -vga
cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

kvm-snapshot is just a script that runs /usr/bin/kvm "$@" -snapshot

The VMs are .ovl files which link to a single qcow2 disk, which is
hosted on an iscsi volume, with ocfs2 as the filesystem. However, we
reproduced the problem running all the VMs locally, so that seems to
indicate that it's not an IB issue.

Basically, after between 30-40 machines finish booting, the system cpu
utilisation climbs to up to 99.9%. VMs are unresponsive, but the host
itself is still responsive.

Here's some output from perf top while the system is locky:
           263832.00 46.3% delay_tsc
[kernel.kallsyms]
           231491.00 40.7% __ticket_spin_trylock
[kernel.kallsyms]
            14609.00  2.6% native_read_tsc
[kernel.kallsyms]
             9414.00  1.7% do_raw_spin_lock
[kernel.kallsyms]
             8041.00  1.4% local_clock
[kernel.kallsyms]
             6081.00  1.1% native_safe_halt
[kernel.kallsyms]
             3901.00  0.7% __lock_acquire.clone.18
[kernel.kallsyms]
             3665.00  0.6% do_raw_spin_unlock
[kernel.kallsyms]
             3042.00  0.5% __delay
[kernel.kallsyms]
             2484.00  0.4% lock_contended
[kernel.kallsyms]
             2484.00  0.4% sched_clock_cpu
[kernel.kallsyms]
             1906.00  0.3% sched_clock_local
[kernel.kallsyms]
             1419.00  0.2% lock_acquire
[kernel.kallsyms]
             1332.00  0.2% lock_release
[kernel.kallsyms]
              987.00  0.2% tg_load_down
[kernel.kallsyms]
              895.00  0.2% _raw_spin_lock_irqsave
[kernel.kallsyms]
              686.00  0.1% find_busiest_group
[kernel.kallsyms]

I have been looking at the top contended locks from when the system is
idle, with some VMs running and when it's in the locky condition

http://paste.ubuntu.com/582025/ - idle
http://paste.ubuntu.com/582007/ - some VMs
http://paste.ubuntu.com/582019/ - locky

(output is from grep : /proc/lock_stat | head 30)

The main things that I see are fidvid_mutex and idr_lock#3. The
fidvid_mutex seems like it might be related to the high % spent in
delay_tsc from perf top...

Anyway, if someone could give me some suggestions for things to try,
or more information that might help... anything really. :)

Thanks a lot,

ben
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html