On 08/26/2009 05:57 PM, Andrew Theurer wrote:
I recently gathered some performance data when running Windows Server
2008 VMs, and I wanted to share it here. There are 12 Windows
Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent
execution of 6 J2EE type benchmarks. Each benchmark needs a App VM and
a Database VM. The benchmark clients inject a fixed rate of requests
which yields X% CPU utilization on the host. A different hypervisor was
compared; KVM used about 60% more CPU cycles to complete the same amount
of work. Both had their hypervisor specific paravirt IO drivers in the
VMs.
Server is a 2 socket Core/i7, SMT off, with 72 GB memory
Did you use large pages?
Host kernel used was kvm.git v2.6.31-rc3-3419-g6df4865
Qemu was kvm-87. I tried a few newer versions of Qemu; none of them
worked with the RedHat virtIO Windows drivers. I tried:
f3600c589a9ee5ea4c0fec74ed4e06a15b461d52
0.11.0-rc1
0.10.6
kvm-88
All but 0.10.6 had "Problem code 10" driver error in the VM. 0.10.6 had
"a disk read error occurred" very early in the booting of the VM.
Yan?
I/O on the host was not what I would call very high: outbound network
averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
243/sec and write ops was 561/sec
What was the disk bandwidth used? Presumably, direct access to the
volume with cache=off?
linux-aio should help reduce cpu usage.
Host CPU breakdown was the following:
user nice system irq softirq guest idle iowait
5.67 0.00 11.64 0.09 1.05 31.90 46.06 3.59
The amount of kernel time had me concerned. Here is oprofile:
user+system is about 55% of guest time, and it's all overhead.
samples % app name symbol name
1163422 52.3744 kvm-intel.ko vmx_vcpu_run
103996 4.6816 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_set_debugreg
81036 3.6480 kvm.ko kvm_arch_vcpu_ioctl_run
37913 1.7068 qemu-system-x86_64 cpu_physical_memory_rw
34720 1.5630 qemu-system-x86_64 phys_page_find_alloc
We should really optimize these two.
23234 1.0459 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_write_msr_safe
20964 0.9437 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_get_debugreg
17628 0.7936 libc-2.5.so memcpy
16587 0.7467 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __down_read
15681 0.7059 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __up_read
15466 0.6962 kvm.ko find_highest_vector
14611 0.6578 qemu-system-x86_64 qemu_get_ram_ptr
11254 0.5066 kvm-intel.ko vmcs_writel
11133 0.5012 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 copy_user_generic_string
10917 0.4915 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_read_msr_safe
10760 0.4844 qemu-system-x86_64 virtqueue_get_head
9025 0.4063 kvm-intel.ko vmx_handle_exit
8953 0.4030 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 schedule
8753 0.3940 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fget_light
8465 0.3811 qemu-system-x86_64 virtqueue_avail_bytes
8185 0.3685 kvm-intel.ko handle_cr
8069 0.3632 kvm.ko kvm_set_irq
7697 0.3465 kvm.ko kvm_lapic_sync_from_vapic
7586 0.3415 qemu-system-x86_64 main_loop_wait
7480 0.3367 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 do_select
7121 0.3206 qemu-system-x86_64 lduw_phys
7003 0.3153 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_exit
6062 0.2729 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 kfree
5477 0.2466 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fput
5454 0.2455 kvm.ko kvm_lapic_get_cr8
5096 0.2294 kvm.ko kvm_load_guest_fpu
5057 0.2277 kvm.ko apic_update_ppr
4929 0.2219 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 up_read
4900 0.2206 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_entry
4866 0.2191 kvm.ko kvm_apic_has_interrupt
4670 0.2102 kvm-intel.ko skip_emulated_instruction
4644 0.2091 kvm.ko kvm_cpu_has_interrupt
4548 0.2047 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __switch_to
4328 0.1948 kvm.ko kvm_apic_accept_pic_intr
4303 0.1937 libpthread-2.5.so pthread_mutex_lock
4235 0.1906 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 system_call
4175 0.1879 kvm.ko kvm_put_guest_fpu
4170 0.1877 qemu-system-x86_64 ldl_phys
4098 0.1845 kvm-intel.ko vmx_set_interrupt_shadow
4003 0.1802 qemu-system-x86_64 kvm_run
I was wondering why the get/set debugreg was so high. I don't recall
seeing this much with Linux VMs.
Could it be that Windows uses the debug registers? Maybe we're
incorrectly deciding to switch them.
Apart from that, nothing really stands out. We'll just have to optimize
things one by one.
Here is an average of kvm_stat:
efer_relo 0
exits 1262814
100K exits/sec/vm. This is high.
fpu_reloa 103842
So is this -- maybe we're misdetecting fpu usage on EPT.
halt_exit 9918
halt_wake 9763
host_stat 103846
This is presumably due to virtio in qemu.
hypercall 0
insn_emul 23277
insn_emul 23277
invlpg 0
io_exits 82717
Yes, it is.
irq_exits 12797
irq_injec 18806
irq_windo 1194
largepage 12
mmio_exit 0
mmu_cache 0
mmu_flood 0
mmu_pde_z 0
mmu_pte_u 0
mmu_pte_w 0
mmu_recyc 0
mmu_shado 0
mmu_unsyn 0
nmi_injec 0
nmi_windo 0
pf_fixed 12
pf_guest 0
remote_tl 0
request_i 0
signal_ex 0
tlb_flush 0
For 12 VMs, do the number of exits/sec seem reasonable?
Comments?
Not all of the exits are accounted for, so we're missing a big part of
the picture. 2.6.32 will have better statistics through ftrace.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html