Hi, sorry about the lengthy e-mail. We've been evaluating KVM for a while. Currently the host is on 2.6.30-bpo.1-amd64, 4 CPU cores on an Intel Xeon 2,33. Disk controller is Areca ARC-1210 and the machine has 12GB of memory. KVM 85+dfsg-4~bpo50+1 and libvirt 0.6.5-3~bpo50+1, both from backports.org. Guests are in qcow2 images. A few test servers have been running here for a while. That worked ok, so we've moved a few production servers on it as well. It's now running 8 guests, none of them are CPU or disk intensive (well, there are a mail server and web server there, which from time to time spike, but it's generally very low). After a reboot the other day, performance is suddenly disaster. The only change we can see we've done is that we've allocated a bit more memory to the guests, and enabled 4 vcpus on all guests (some of them ran with 1 vcpu before). When I say performance is bad, it's to the point where typing on the keyboard is lagging. It seems load on one guest affects all of the others. What is weird is that before the reboot, the host machine usually had a system load of about 0.30 on average, and a CPU load of 20-30% (total of all cores). After the reboot, this is a typical top output: top - 11:17:49 up 1 day, 5:32, 3 users, load average: 3.81, 3.85, 3.96 Tasks: 113 total, 4 running, 109 sleeping, 0 stopped, 0 zombie Cpu0 : 93.7%us, 3.6%sy, 0.0%ni, 0.7%id, 1.7%wa, 0.3%hi, 0.0%si, 0.0%st Cpu1 : 96.3%us, 3.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 93.7%us, 5.0%sy, 0.0%ni, 1.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu3 : 91.4%us, 5.6%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 12335492k total, 12257056k used, 78436k free, 24k buffers Swap: 7807580k total, 744584k used, 7062996k free, 4927212k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3398 root 20 0 2287m 1.9g 680 S 149 16.5 603:10.92 kvm 5041 root 20 0 2255m 890m 540 R 99 7.4 603:38.07 kvm 5055 root 20 0 2272m 980m 668 S 86 8.1 305:42.95 kvm 5095 root 20 0 2287m 1.9g 532 R 33 16.6 655:11.53 kvm 5073 root 20 0 2253m 435m 532 S 19 3.6 371:59.80 kvm 3334 root 20 0 2254m 66m 532 S 6 0.5 106:58.20 kvm Now this is the weird part: The guests are (really!) doing nothing. Before this started, each guest's load were typically 0.02 - 0.30. Now their load is suddenly 2.x and in top, even simple CPU processes like syslogd uses 20% CPU. It _might_ seem like an i/o problem, because disk performance seems bad on all guests. find / would ususally fly by, now you'd see a bit lag'ish output (I know, bad performance test). The host machine seems fast&fine, except it has a system load of about 2-6. It seems snappy, though. Here you have some info like iostat etc: # iostat -kdxx1 inux 2.6.30-bpo.1-amd64 (cf01) 02/18/2010 _x86_64_ Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 1.73 18.91 29.89 18.40 855.89 454.17 54.26 0.48 9.98 2.23 10.75 sda1 0.35 16.17 29.58 18.24 849.11 442.58 54.03 0.47 9.79 2.20 10.52 sda2 1.38 2.74 0.31 0.16 6.78 11.59 77.44 0.01 29.14 13.60 0.64 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 6.00 0.00 1.00 0.00 4.00 0.00 8.00 0.31 0.00 308.00 30.80 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 6.00 0.00 1.00 0.00 4.00 0.00 8.00 0.31 0.00 308.00 30.80 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 1.00 0.00 28.00 0.00 56.00 0.63 936.00 628.00 62.80 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 1.00 0.00 28.00 0.00 56.00 0.63 936.00 628.00 62.80 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 4.00 0.00 228.00 0.00 114.00 0.04 9.00 5.00 2.00 sda1 0.00 0.00 4.00 0.00 228.00 0.00 114.00 0.04 9.00 5.00 2.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 6.00 0.00 3.00 0.00 36.00 0.00 24.00 0.06 21.33 14.67 4.40 sda1 0.00 0.00 1.00 0.00 4.00 0.00 8.00 0.02 24.00 24.00 2.40 sda2 6.00 0.00 2.00 0.00 32.00 0.00 32.00 0.04 20.00 10.00 2.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 4.00 0.00 2.00 92.00 24.00 786.50 17.24 0.03 0.34 0.17 1.60 sda1 0.00 0.00 0.00 92.00 0.00 786.50 17.10 0.00 0.00 0.00 0.00 sda2 4.00 0.00 2.00 0.00 24.00 0.00 24.00 0.03 16.00 8.00 1.60 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 # vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 4 0 747292 98584 24 4912368 2 3 207 110 8 14 35 5 58 2 # vmstat -d disk- ------------reads------------ ------------writes----------- -----IO------ total merged sectors ms total merged sectors ms cur sec ram0 0 0 0 0 0 0 0 0 0 0 ram1 0 0 0 0 0 0 0 0 0 0 ram2 0 0 0 0 0 0 0 0 0 0 ram3 0 0 0 0 0 0 0 0 0 0 ram4 0 0 0 0 0 0 0 0 0 0 ram5 0 0 0 0 0 0 0 0 0 0 ram6 0 0 0 0 0 0 0 0 0 0 ram7 0 0 0 0 0 0 0 0 0 0 ram8 0 0 0 0 0 0 0 0 0 0 ram9 0 0 0 0 0 0 0 0 0 0 ram10 0 0 0 0 0 0 0 0 0 0 ram11 0 0 0 0 0 0 0 0 0 0 ram12 0 0 0 0 0 0 0 0 0 0 ram13 0 0 0 0 0 0 0 0 0 0 ram14 0 0 0 0 0 0 0 0 0 0 ram15 0 0 0 0 0 0 0 0 0 0 sda 3170691 184220 181548275 19898284 1952716 2005771 96380008 31234880 0 11405 sda1 3137240 37285 180105335 18834360 1935681 1715307 93920016 30830644 0 11162 sda2 33429 146912 1442580 1063828 17035 290464 2459992 404236 0 685 sr0 0 0 0 0 0 0 0 0 0 0 loop0 0 0 0 0 0 0 0 0 0 0 loop1 0 0 0 0 0 0 0 0 0 0 loop2 0 0 0 0 0 0 0 0 0 0 loop3 0 0 0 0 0 0 0 0 0 0 loop4 0 0 0 0 0 0 0 0 0 0 loop5 0 0 0 0 0 0 0 0 0 0 loop6 0 0 0 0 0 0 0 0 0 0 loop7 0 0 0 0 0 0 0 0 0 0 # kvmstat kvm statistics efer_reload 2493 0 exits 7012998022 86420 fpu_reload 107956245 1121 halt_exits 839269827 10930 halt_wakeup 79528805 1364 host_state_reload 1159155068 15293 hypercalls 1471039754 17008 insn_emulation 2782749902 35121 insn_emulation_fail 0 0 invlpg 172119687 1754 io_exits 129482688 2084 irq_exits 455515434 4884 irq_injections 973172925 12423 irq_window 41631517 635 largepages 0 0 mmio_exits 941756 0 mmu_cache_miss 74512394 849 mmu_flooded 5132926 41 mmu_pde_zapped 40341877 356 mmu_pte_updated 1150029759 13443 mmu_pte_write 2184765599 27182 mmu_recycled 52261 0 mmu_shadow_zapped 74494953 766 mmu_unsync 390 23 mmu_unsync_global 0 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 470057939 5144 pf_guest 463801876 5900 remote_tlb_flush 128765057 1024 request_irq 0 0 request_nmi 0 0 signal_exits 0 0 tlb_flush 1528830191 17996 Any hints on how to figure out why this happens? Thanks! -- Vegard Svanberg <vegard@xxxxxxxxxxx> [*Takapa@IRC (EFnet)] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html