Re: Poor performance with KVM, how to figure out why?

"Beinicke, Thomas" <thomas.beinicke@xxxxxxxxxx> · Thu, 18 Feb 2010 11:50:37 +0100

On Thursday 18 February 2010 11:31:36 you wrote:
> Hi, sorry about the lengthy e-mail.

Hi,

are you sure the kvm-intel kernel module is loaded?

What is the output of  "lsmod" ?

Any useful kernel messages on the host or the VMs? What's the output of 
"dmesg"?

Cheers,

Thomas

> We've been evaluating KVM for a while. Currently the host is on
> 2.6.30-bpo.1-amd64, 4 CPU cores on an Intel Xeon 2,33. Disk controller is
> Areca ARC-1210 and the machine has 12GB of memory. KVM 85+dfsg-4~bpo50+1
> and libvirt 0.6.5-3~bpo50+1, both from backports.org. Guests are in qcow2
> images.
> 
> A few test servers have been running here for a while. That worked ok,
> so we've moved a few production servers on it as well. It's now running
> 8 guests, none of them are CPU or disk intensive (well, there are a mail
> server and web server there, which from time to time spike, but it's
> generally very low).
> 
> After a reboot the other day, performance is suddenly disaster. The only
> change we can see we've done is that we've allocated a bit more memory
> to the guests, and enabled 4 vcpus on all guests (some of them ran with
> 1 vcpu before). When I say performance is bad, it's to the point where
> typing on the keyboard is lagging. It seems load on one guest affects all
> of the others.
> 
> What is weird is that before the reboot, the host machine usually had a
> system load of about 0.30 on average, and a CPU load of 20-30% (total of
> all cores). After the reboot, this is a typical top output:
> 
> top - 11:17:49 up 1 day,  5:32,  3 users,  load average: 3.81, 3.85, 3.96
> Tasks: 113 total,   4 running, 109 sleeping,   0 stopped,   0 zombie
> Cpu0  : 93.7%us,  3.6%sy,  0.0%ni,  0.7%id,  1.7%wa,  0.3%hi,  0.0%si, 
> 0.0%st Cpu1  : 96.3%us,  3.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 
> 0.0%si,  0.0%st Cpu2  : 93.7%us,  5.0%sy,  0.0%ni,  1.0%id,  0.0%wa, 
> 0.0%hi,  0.3%si,  0.0%st Cpu3  : 91.4%us,  5.6%sy,  0.0%ni,  2.7%id, 
> 0.0%wa,  0.0%hi,  0.3%si,  0.0%st Mem:  12335492k total, 12257056k used,  
>  78436k free,       24k buffers Swap:  7807580k total,   744584k used, 
> 7062996k free,  4927212k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  3398 root      20   0 2287m 1.9g  680 S  149 16.5 603:10.92 kvm
>  5041 root      20   0 2255m 890m  540 R   99  7.4 603:38.07 kvm
>  5055 root      20   0 2272m 980m  668 S   86  8.1 305:42.95 kvm
>  5095 root      20   0 2287m 1.9g  532 R   33 16.6 655:11.53 kvm
>  5073 root      20   0 2253m 435m  532 S   19  3.6 371:59.80 kvm
>  3334 root      20   0 2254m  66m  532 S    6  0.5 106:58.20 kvm
> 
> Now this is the weird part: The guests are (really!) doing nothing.
> Before this started, each guest's load were typically 0.02 - 0.30. Now
> their load is suddenly 2.x and in top, even simple CPU processes like
> syslogd uses 20% CPU.
> 
> It _might_ seem like an i/o problem, because disk performance seems bad
> on all guests. find / would ususally fly by, now you'd see a bit lag'ish
> output (I know, bad performance test).
> 
> The host machine seems fast&fine, except it has a system load of about
> 2-6. It seems snappy, though. Here you have some info like iostat etc:
> 
> # iostat -kdxx1
> 
> inux 2.6.30-bpo.1-amd64 (cf01)  02/18/2010      _x86_64_
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda               1.73    18.91   29.89  
> 18.40   855.89   454.17    54.26     0.48    9.98   2.23  10.75 sda1      
>        0.35    16.17   29.58   18.24   849.11   442.58    54.03     0.47  
>  9.79   2.20  10.52 sda2              1.38     2.74    0.31    0.16    
> 6.78    11.59    77.44     0.01   29.14  13.60   0.64
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda               6.00     0.00    1.00   
> 0.00     4.00     0.00     8.00     0.31    0.00 308.00  30.80 sda1       
>       0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00   
> 0.00   0.00   0.00 sda2              6.00     0.00    1.00    0.00    
> 4.00     0.00     8.00     0.31    0.00 308.00  30.80
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda               0.00     0.00    1.00   
> 0.00    28.00     0.00    56.00     0.63  936.00 628.00  62.80 sda1       
>       0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00   
> 0.00   0.00   0.00 sda2              0.00     0.00    1.00    0.00   
> 28.00     0.00    56.00     0.63  936.00 628.00  62.80
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda               0.00     0.00    4.00   
> 0.00   228.00     0.00   114.00     0.04    9.00   5.00   2.00 sda1       
>       0.00     0.00    4.00    0.00   228.00     0.00   114.00     0.04   
> 9.00   5.00   2.00 sda2              0.00     0.00    0.00    0.00    
> 0.00     0.00     0.00     0.00    0.00   0.00   0.00
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda               6.00     0.00    3.00   
> 0.00    36.00     0.00    24.00     0.06   21.33  14.67   4.40 sda1       
>       0.00     0.00    1.00    0.00     4.00     0.00     8.00     0.02  
> 24.00  24.00   2.40 sda2              6.00     0.00    2.00    0.00   
> 32.00     0.00    32.00     0.04   20.00  10.00   2.00
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda               4.00     0.00    2.00  
> 92.00    24.00   786.50    17.24     0.03    0.34   0.17   1.60 sda1      
>        0.00     0.00    0.00   92.00     0.00   786.50    17.10     0.00  
>  0.00   0.00   0.00 sda2              4.00     0.00    2.00    0.00   
> 24.00     0.00    24.00     0.03   16.00   8.00   1.60
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda               0.00     0.00    0.00   
> 0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00 sda1       
>       0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00   
> 0.00   0.00   0.00 sda2              0.00     0.00    0.00    0.00    
> 0.00     0.00     0.00     0.00    0.00   0.00   0.00
> 
> # vmstat
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in  
> cs us sy id wa 4  0 747292  98584     24 4912368    2    3   207   110   
> 8   14 35  5 58  2 # vmstat -d
> disk- ------------reads------------ ------------writes-----------
> -----IO------ total merged sectors      ms  total merged sectors      ms  
>  cur    sec ram0       0      0       0       0      0      0       0     
>  0      0      0 ram1       0      0       0       0      0      0       0
>       0      0      0 ram2       0      0       0       0      0      0   
>    0       0      0      0 ram3       0      0       0       0      0     
> 0       0       0      0      0 ram4       0      0       0       0      0
>      0       0       0      0      0 ram5       0      0       0       0  
>    0      0       0       0      0      0 ram6       0      0       0     
>  0      0      0       0       0      0      0 ram7       0      0       0
>       0      0      0       0       0      0      0 ram8       0      0   
>    0       0      0      0       0       0      0      0 ram9       0     
> 0       0       0      0      0       0       0      0      0 ram10      0
>      0       0       0      0      0       0       0      0      0 ram11  
>    0      0       0       0      0      0       0       0      0      0
> ram12      0      0       0       0      0      0       0       0      0  
>    0 ram13      0      0       0       0      0      0       0       0    
>  0      0 ram14      0      0       0       0      0      0       0      
> 0      0      0 ram15      0      0       0       0      0      0       0 
>      0      0      0 sda   3170691 184220 181548275 19898284 1952716
> 2005771 96380008 31234880      0  11405 sda1  3137240  37285 180105335
> 18834360 1935681 1715307 93920016 30830644      0  11162 sda2   33429
> 146912 1442580 1063828  17035 290464 2459992  404236      0    685 sr0    
>    0      0       0       0      0      0       0       0      0      0
> loop0      0      0       0       0      0      0       0       0      0  
>    0 loop1      0      0       0       0      0      0       0       0    
>  0      0 loop2      0      0       0       0      0      0       0      
> 0      0      0 loop3      0      0       0       0      0      0       0 
>      0      0      0 loop4      0      0       0       0      0      0    
>   0       0      0      0 loop5      0      0       0       0      0     
> 0       0       0      0      0 loop6      0      0       0       0      0
>      0       0       0      0      0 loop7      0      0       0       0  
>    0      0       0       0      0      0
> 
> # kvmstat
> 
> kvm statistics
> 
>  efer_reload               2493       0
>  exits               7012998022   86420
>  fpu_reload           107956245    1121
>  halt_exits           839269827   10930
>  halt_wakeup           79528805    1364
>  host_state_reload   1159155068   15293
>  hypercalls          1471039754   17008
>  insn_emulation      2782749902   35121
>  insn_emulation_fail          0       0
>  invlpg               172119687    1754
>  io_exits             129482688    2084
>  irq_exits            455515434    4884
>  irq_injections       973172925   12423
>  irq_window            41631517     635
>  largepages                   0       0
>  mmio_exits              941756       0
>  mmu_cache_miss        74512394     849
>  mmu_flooded            5132926      41
>  mmu_pde_zapped        40341877     356
>  mmu_pte_updated     1150029759   13443
>  mmu_pte_write       2184765599   27182
>  mmu_recycled             52261       0
>  mmu_shadow_zapped     74494953     766
>  mmu_unsync                 390      23
>  mmu_unsync_global            0       0
>  nmi_injections               0       0
>  nmi_window                   0       0
>  pf_fixed             470057939    5144
>  pf_guest             463801876    5900
>  remote_tlb_flush     128765057    1024
>  request_irq                  0       0
>  request_nmi                  0       0
>  signal_exits                 0       0
>  tlb_flush           1528830191   17996
> 
> Any hints on how to figure out why this happens? Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html