On Thursday 18 February 2010 11:31:36 you wrote: > Hi, sorry about the lengthy e-mail. Hi, are you sure the kvm-intel kernel module is loaded? What is the output of "lsmod" ? Any useful kernel messages on the host or the VMs? What's the output of "dmesg"? Cheers, Thomas > We've been evaluating KVM for a while. Currently the host is on > 2.6.30-bpo.1-amd64, 4 CPU cores on an Intel Xeon 2,33. Disk controller is > Areca ARC-1210 and the machine has 12GB of memory. KVM 85+dfsg-4~bpo50+1 > and libvirt 0.6.5-3~bpo50+1, both from backports.org. Guests are in qcow2 > images. > > A few test servers have been running here for a while. That worked ok, > so we've moved a few production servers on it as well. It's now running > 8 guests, none of them are CPU or disk intensive (well, there are a mail > server and web server there, which from time to time spike, but it's > generally very low). > > After a reboot the other day, performance is suddenly disaster. The only > change we can see we've done is that we've allocated a bit more memory > to the guests, and enabled 4 vcpus on all guests (some of them ran with > 1 vcpu before). When I say performance is bad, it's to the point where > typing on the keyboard is lagging. It seems load on one guest affects all > of the others. > > What is weird is that before the reboot, the host machine usually had a > system load of about 0.30 on average, and a CPU load of 20-30% (total of > all cores). After the reboot, this is a typical top output: > > top - 11:17:49 up 1 day, 5:32, 3 users, load average: 3.81, 3.85, 3.96 > Tasks: 113 total, 4 running, 109 sleeping, 0 stopped, 0 zombie > Cpu0 : 93.7%us, 3.6%sy, 0.0%ni, 0.7%id, 1.7%wa, 0.3%hi, 0.0%si, > 0.0%st Cpu1 : 96.3%us, 3.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, > 0.0%si, 0.0%st Cpu2 : 93.7%us, 5.0%sy, 0.0%ni, 1.0%id, 0.0%wa, > 0.0%hi, 0.3%si, 0.0%st Cpu3 : 91.4%us, 5.6%sy, 0.0%ni, 2.7%id, > 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 12335492k total, 12257056k used, > 78436k free, 24k buffers Swap: 7807580k total, 744584k used, > 7062996k free, 4927212k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 3398 root 20 0 2287m 1.9g 680 S 149 16.5 603:10.92 kvm > 5041 root 20 0 2255m 890m 540 R 99 7.4 603:38.07 kvm > 5055 root 20 0 2272m 980m 668 S 86 8.1 305:42.95 kvm > 5095 root 20 0 2287m 1.9g 532 R 33 16.6 655:11.53 kvm > 5073 root 20 0 2253m 435m 532 S 19 3.6 371:59.80 kvm > 3334 root 20 0 2254m 66m 532 S 6 0.5 106:58.20 kvm > > Now this is the weird part: The guests are (really!) doing nothing. > Before this started, each guest's load were typically 0.02 - 0.30. Now > their load is suddenly 2.x and in top, even simple CPU processes like > syslogd uses 20% CPU. > > It _might_ seem like an i/o problem, because disk performance seems bad > on all guests. find / would ususally fly by, now you'd see a bit lag'ish > output (I know, bad performance test). > > The host machine seems fast&fine, except it has a system load of about > 2-6. It seems snappy, though. Here you have some info like iostat etc: > > # iostat -kdxx1 > > inux 2.6.30-bpo.1-amd64 (cf01) 02/18/2010 _x86_64_ > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await svctm %util sda 1.73 18.91 29.89 > 18.40 855.89 454.17 54.26 0.48 9.98 2.23 10.75 sda1 > 0.35 16.17 29.58 18.24 849.11 442.58 54.03 0.47 > 9.79 2.20 10.52 sda2 1.38 2.74 0.31 0.16 > 6.78 11.59 77.44 0.01 29.14 13.60 0.64 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await svctm %util sda 6.00 0.00 1.00 > 0.00 4.00 0.00 8.00 0.31 0.00 308.00 30.80 sda1 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 sda2 6.00 0.00 1.00 0.00 > 4.00 0.00 8.00 0.31 0.00 308.00 30.80 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await svctm %util sda 0.00 0.00 1.00 > 0.00 28.00 0.00 56.00 0.63 936.00 628.00 62.80 sda1 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 sda2 0.00 0.00 1.00 0.00 > 28.00 0.00 56.00 0.63 936.00 628.00 62.80 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await svctm %util sda 0.00 0.00 4.00 > 0.00 228.00 0.00 114.00 0.04 9.00 5.00 2.00 sda1 > 0.00 0.00 4.00 0.00 228.00 0.00 114.00 0.04 > 9.00 5.00 2.00 sda2 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await svctm %util sda 6.00 0.00 3.00 > 0.00 36.00 0.00 24.00 0.06 21.33 14.67 4.40 sda1 > 0.00 0.00 1.00 0.00 4.00 0.00 8.00 0.02 > 24.00 24.00 2.40 sda2 6.00 0.00 2.00 0.00 > 32.00 0.00 32.00 0.04 20.00 10.00 2.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await svctm %util sda 4.00 0.00 2.00 > 92.00 24.00 786.50 17.24 0.03 0.34 0.17 1.60 sda1 > 0.00 0.00 0.00 92.00 0.00 786.50 17.10 0.00 > 0.00 0.00 0.00 sda2 4.00 0.00 2.00 0.00 > 24.00 0.00 24.00 0.03 16.00 8.00 1.60 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await svctm %util sda 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda1 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > # vmstat > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- r b swpd free buff cache si so bi bo in > cs us sy id wa 4 0 747292 98584 24 4912368 2 3 207 110 > 8 14 35 5 58 2 # vmstat -d > disk- ------------reads------------ ------------writes----------- > -----IO------ total merged sectors ms total merged sectors ms > cur sec ram0 0 0 0 0 0 0 0 > 0 0 0 ram1 0 0 0 0 0 0 0 > 0 0 0 ram2 0 0 0 0 0 0 > 0 0 0 0 ram3 0 0 0 0 0 > 0 0 0 0 0 ram4 0 0 0 0 0 > 0 0 0 0 0 ram5 0 0 0 0 > 0 0 0 0 0 0 ram6 0 0 0 > 0 0 0 0 0 0 0 ram7 0 0 0 > 0 0 0 0 0 0 0 ram8 0 0 > 0 0 0 0 0 0 0 0 ram9 0 > 0 0 0 0 0 0 0 0 0 ram10 0 > 0 0 0 0 0 0 0 0 0 ram11 > 0 0 0 0 0 0 0 0 0 0 > ram12 0 0 0 0 0 0 0 0 0 > 0 ram13 0 0 0 0 0 0 0 0 > 0 0 ram14 0 0 0 0 0 0 0 > 0 0 0 ram15 0 0 0 0 0 0 0 > 0 0 0 sda 3170691 184220 181548275 19898284 1952716 > 2005771 96380008 31234880 0 11405 sda1 3137240 37285 180105335 > 18834360 1935681 1715307 93920016 30830644 0 11162 sda2 33429 > 146912 1442580 1063828 17035 290464 2459992 404236 0 685 sr0 > 0 0 0 0 0 0 0 0 0 0 > loop0 0 0 0 0 0 0 0 0 0 > 0 loop1 0 0 0 0 0 0 0 0 > 0 0 loop2 0 0 0 0 0 0 0 > 0 0 0 loop3 0 0 0 0 0 0 0 > 0 0 0 loop4 0 0 0 0 0 0 > 0 0 0 0 loop5 0 0 0 0 0 > 0 0 0 0 0 loop6 0 0 0 0 0 > 0 0 0 0 0 loop7 0 0 0 0 > 0 0 0 0 0 0 > > # kvmstat > > kvm statistics > > efer_reload 2493 0 > exits 7012998022 86420 > fpu_reload 107956245 1121 > halt_exits 839269827 10930 > halt_wakeup 79528805 1364 > host_state_reload 1159155068 15293 > hypercalls 1471039754 17008 > insn_emulation 2782749902 35121 > insn_emulation_fail 0 0 > invlpg 172119687 1754 > io_exits 129482688 2084 > irq_exits 455515434 4884 > irq_injections 973172925 12423 > irq_window 41631517 635 > largepages 0 0 > mmio_exits 941756 0 > mmu_cache_miss 74512394 849 > mmu_flooded 5132926 41 > mmu_pde_zapped 40341877 356 > mmu_pte_updated 1150029759 13443 > mmu_pte_write 2184765599 27182 > mmu_recycled 52261 0 > mmu_shadow_zapped 74494953 766 > mmu_unsync 390 23 > mmu_unsync_global 0 0 > nmi_injections 0 0 > nmi_window 0 0 > pf_fixed 470057939 5144 > pf_guest 463801876 5900 > remote_tlb_flush 128765057 1024 > request_irq 0 0 > request_nmi 0 0 > signal_exits 0 0 > tlb_flush 1528830191 17996 > > Any hints on how to figure out why this happens? Thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html