On 02/18/2010 12:31 PM, Vegard Svanberg wrote:
Hi, sorry about the lengthy e-mail.
We've been evaluating KVM for a while. Currently the host is on
2.6.30-bpo.1-amd64, 4 CPU cores on an Intel Xeon 2,33. Disk controller is Areca
ARC-1210 and the machine has 12GB of memory. KVM 85+dfsg-4~bpo50+1 and libvirt
0.6.5-3~bpo50+1, both from backports.org. Guests are in qcow2 images.
These are all really old.
A few test servers have been running here for a while. That worked ok,
so we've moved a few production servers on it as well. It's now running
8 guests, none of them are CPU or disk intensive (well, there are a mail server
and web server there, which from time to time spike, but it's generally very
low).
After a reboot the other day, performance is suddenly disaster. The only
change we can see we've done is that we've allocated a bit more memory
to the guests, and enabled 4 vcpus on all guests (some of them ran with
1 vcpu before). When I say performance is bad, it's to the point where
typing on the keyboard is lagging. It seems load on one guest affects all of
the others.
What is weird is that before the reboot, the host machine usually had a
system load of about 0.30 on average, and a CPU load of 20-30% (total of
all cores). After the reboot, this is a typical top output:
top - 11:17:49 up 1 day, 5:32, 3 users, load average: 3.81, 3.85, 3.96
Tasks: 113 total, 4 running, 109 sleeping, 0 stopped, 0 zombie
Cpu0 : 93.7%us, 3.6%sy, 0.0%ni, 0.7%id, 1.7%wa, 0.3%hi, 0.0%si, 0.0%st
Cpu1 : 96.3%us, 3.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 93.7%us, 5.0%sy, 0.0%ni, 1.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 91.4%us, 5.6%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 12335492k total, 12257056k used, 78436k free, 24k buffers
Swap: 7807580k total, 744584k used, 7062996k free, 4927212k cached
Looks like you're running into swap. Does 'vmstat 1' show swap activity?
Try dropping the vcpu count and memory back to initial levels,
separately, to see which triggers the bad behaviour.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3398 root 20 0 2287m 1.9g 680 S 149 16.5 603:10.92 kvm
5041 root 20 0 2255m 890m 540 R 99 7.4 603:38.07 kvm
5055 root 20 0 2272m 980m 668 S 86 8.1 305:42.95 kvm
5095 root 20 0 2287m 1.9g 532 R 33 16.6 655:11.53 kvm
5073 root 20 0 2253m 435m 532 S 19 3.6 371:59.80 kvm
3334 root 20 0 2254m 66m 532 S 6 0.5 106:58.20 kvm
None of the RSS figures are nice round numbers, which might mean some of
guest memory is swapped out.
(note: you can run kvm as non-root).
Now this is the weird part: The guests are (really!) doing nothing.
Before this started, each guest's load were typically 0.02 - 0.30. Now
their load is suddenly 2.x and in top, even simple CPU processes like
syslogd uses 20% CPU.
That may also be an indication of swap. When the guest accesses a
swapped out page, the time to swap in the page is accounted to the
instruction that caused the access.
# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
4 0 747292 98584 24 4912368 2 3 207 110 8 14 35 5 58 2
'vmstat' rate data is from machine boot up. 'vmstat 1' gives current rates.
kvm statistics
efer_reload 2493 0
exits 7012998022 86420
fpu_reload 107956245 1121
halt_exits 839269827 10930
halt_wakeup 79528805 1364
host_state_reload 1159155068 15293
hypercalls 1471039754 17008
insn_emulation 2782749902 35121
insn_emulation_fail 0 0
invlpg 172119687 1754
io_exits 129482688 2084
irq_exits 455515434 4884
irq_injections 973172925 12423
irq_window 41631517 635
largepages 0 0
mmio_exits 941756 0
mmu_cache_miss 74512394 849
mmu_flooded 5132926 41
mmu_pde_zapped 40341877 356
mmu_pte_updated 1150029759 13443
mmu_pte_write 2184765599 27182
mmu_recycled 52261 0
mmu_shadow_zapped 74494953 766
mmu_unsync 390 23
mmu_unsync_global 0 0
nmi_injections 0 0
nmi_window 0 0
pf_fixed 470057939 5144
pf_guest 463801876 5900
remote_tlb_flush 128765057 1024
request_irq 0 0
request_nmi 0 0
signal_exits 0 0
tlb_flush 1528830191 17996
This is reasonable for a 4-cpu host under moderate load.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html