Re: Poor performance with KVM, how to figure out why?

Avi Kivity <avi@xxxxxxxxxx> · Thu, 18 Feb 2010 12:49:58 +0200

On 02/18/2010 12:31 PM, Vegard Svanberg wrote:
Hi, sorry about the lengthy e-mail.

We've been evaluating KVM for a while. Currently the host is on
2.6.30-bpo.1-amd64, 4 CPU cores on an Intel Xeon 2,33. Disk controller is Areca
ARC-1210 and the machine has 12GB of memory. KVM 85+dfsg-4~bpo50+1 and libvirt
0.6.5-3~bpo50+1, both from backports.org. Guests are in qcow2 images.

These are all really old.

A few test servers have been running here for a while. That worked ok,
so we've moved a few production servers on it as well. It's now running
8 guests, none of them are CPU or disk intensive (well, there are a mail server
and web server there, which from time to time spike, but it's generally very
low).

After a reboot the other day, performance is suddenly disaster. The only
change we can see we've done is that we've allocated a bit more memory
to the guests, and enabled 4 vcpus on all guests (some of them ran with
1 vcpu before). When I say performance is bad, it's to the point where
typing on the keyboard is lagging. It seems load on one guest affects all of
the others.

What is weird is that before the reboot, the host machine usually had a
system load of about 0.30 on average, and a CPU load of 20-30% (total of
all cores). After the reboot, this is a typical top output:

top - 11:17:49 up 1 day,  5:32,  3 users,  load average: 3.81, 3.85, 3.96
Tasks: 113 total,   4 running, 109 sleeping,   0 stopped,   0 zombie
Cpu0  : 93.7%us,  3.6%sy,  0.0%ni,  0.7%id,  1.7%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu1  : 96.3%us,  3.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 93.7%us,  5.0%sy,  0.0%ni,  1.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  : 91.4%us,  5.6%sy,  0.0%ni,  2.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  12335492k total, 12257056k used,    78436k free,       24k buffers
Swap:  7807580k total,   744584k used,  7062996k free,  4927212k cached

Looks like you're running into swap.  Does 'vmstat 1' show swap activity?

Try dropping the vcpu count and memory back to initial levels, 
separately, to see which triggers the bad behaviour.

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  3398 root      20   0 2287m 1.9g  680 S  149 16.5 603:10.92 kvm
  5041 root      20   0 2255m 890m  540 R   99  7.4 603:38.07 kvm
  5055 root      20   0 2272m 980m  668 S   86  8.1 305:42.95 kvm
  5095 root      20   0 2287m 1.9g  532 R   33 16.6 655:11.53 kvm
  5073 root      20   0 2253m 435m  532 S   19  3.6 371:59.80 kvm
  3334 root      20   0 2254m  66m  532 S    6  0.5 106:58.20 kvm

None of the RSS figures are nice round numbers, which might mean some of 
guest memory is swapped out.

(note: you can run kvm as non-root).

Now this is the weird part: The guests are (really!) doing nothing.
Before this started, each guest's load were typically 0.02 - 0.30. Now
their load is suddenly 2.x and in top, even simple CPU processes like
syslogd uses 20% CPU.

That may also be an indication of swap.  When the guest accesses a 
swapped out page, the time to swap in the page is accounted to the 
instruction that caused the access.

# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
  4  0 747292  98584     24 4912368    2    3   207   110    8   14 35  5 58  2

'vmstat' rate data is from machine boot up.  'vmstat 1' gives current rates.

kvm statistics

  efer_reload               2493       0
  exits               7012998022   86420
  fpu_reload           107956245    1121
  halt_exits           839269827   10930
  halt_wakeup           79528805    1364
  host_state_reload   1159155068   15293
  hypercalls          1471039754   17008
  insn_emulation      2782749902   35121
  insn_emulation_fail          0       0
  invlpg               172119687    1754
  io_exits             129482688    2084
  irq_exits            455515434    4884
  irq_injections       973172925   12423
  irq_window            41631517     635
  largepages                   0       0
  mmio_exits              941756       0
  mmu_cache_miss        74512394     849
  mmu_flooded            5132926      41
  mmu_pde_zapped        40341877     356
  mmu_pte_updated     1150029759   13443
  mmu_pte_write       2184765599   27182
  mmu_recycled             52261       0
  mmu_shadow_zapped     74494953     766
  mmu_unsync                 390      23
  mmu_unsync_global            0       0
  nmi_injections               0       0
  nmi_window                   0       0
  pf_fixed             470057939    5144
  pf_guest             463801876    5900
  remote_tlb_flush     128765057    1024
  request_irq                  0       0
  request_nmi                  0       0
  signal_exits                 0       0
  tlb_flush           1528830191   17996

This is reasonable for a 4-cpu host under moderate load.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html