Re: Performace data when running Windows VMs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2009-08-26 at 18:44 +0300, Avi Kivity wrote:
> On 08/26/2009 05:57 PM, Andrew Theurer wrote:
> > I recently gathered some performance data when running Windows Server
> > 2008 VMs, and I wanted to share it here.  There are 12 Windows
> > Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent
> > execution of 6 J2EE type benchmarks.  Each benchmark needs a App VM and
> > a Database VM.  The benchmark clients inject a fixed rate of requests
> > which yields X% CPU utilization on the host.  A different hypervisor was
> > compared; KVM used about 60% more CPU cycles to complete the same amount
> > of work.  Both had their hypervisor specific paravirt IO drivers in the
> > VMs.
> >
> > Server is a 2 socket Core/i7, SMT off, with 72 GB memory
> >    
> 
> Did you use large pages?

Yes.
> 
> > Host kernel used was kvm.git v2.6.31-rc3-3419-g6df4865
> > Qemu was kvm-87.  I tried a few newer versions of Qemu; none of them
> > worked with the RedHat virtIO Windows drivers.  I tried:
> >
> > f3600c589a9ee5ea4c0fec74ed4e06a15b461d52
> > 0.11.0-rc1
> > 0.10.6
> > kvm-88
> >
> > All but 0.10.6 had "Problem code 10" driver error in the VM.  0.10.6 had
> > "a disk read error occurred" very early in the booting of the VM.
> >    
> 
> Yan?
> 
> > I/O on the host was not what I would call very high:  outbound network
> > averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
> > 243/sec and write ops was 561/sec
> >    
> 
> What was the disk bandwidth used?  Presumably, direct access to the 
> volume with cache=off?

2.4 MB/sec write, 0.6MB/sec read, cache=none
The VMs' boot disks are IDE, but apps use their second disk which is
virtio.

> linux-aio should help reduce cpu usage.

I assume this is in a newer version of Qemu?

> > Host CPU breakdown was the following:
> >
> > user  nice  system irq  softirq guest  idle  iowait
> > 5.67  0.00  11.64  0.09 1.05    31.90  46.06 3.59
> >
> >
> > The amount of kernel time had me concerned.  Here is oprofile:
> >    
> 
> user+system is about 55% of guest time, and it's all overhead.
> 
> >> samples  %        app name                 symbol name
> >> 1163422  52.3744  kvm-intel.ko             vmx_vcpu_run
> >> 103996    4.6816  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_set_debugreg
> >> 81036     3.6480  kvm.ko                   kvm_arch_vcpu_ioctl_run
> >> 37913     1.7068  qemu-system-x86_64       cpu_physical_memory_rw
> >> 34720     1.5630  qemu-system-x86_64       phys_page_find_alloc
> >>      
> 
> We should really optimize these two.
> 
> >> 23234     1.0459  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_write_msr_safe
> >> 20964     0.9437  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_get_debugreg
> >> 17628     0.7936  libc-2.5.so              memcpy
> >> 16587     0.7467  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __down_read
> >> 15681     0.7059  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __up_read
> >> 15466     0.6962  kvm.ko                   find_highest_vector
> >> 14611     0.6578  qemu-system-x86_64       qemu_get_ram_ptr
> >> 11254     0.5066  kvm-intel.ko             vmcs_writel
> >> 11133     0.5012  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 copy_user_generic_string
> >> 10917     0.4915  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_read_msr_safe
> >> 10760     0.4844  qemu-system-x86_64       virtqueue_get_head
> >> 9025      0.4063  kvm-intel.ko             vmx_handle_exit
> >> 8953      0.4030  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 schedule
> >> 8753      0.3940  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fget_light
> >> 8465      0.3811  qemu-system-x86_64       virtqueue_avail_bytes
> >> 8185      0.3685  kvm-intel.ko             handle_cr
> >> 8069      0.3632  kvm.ko                   kvm_set_irq
> >> 7697      0.3465  kvm.ko                   kvm_lapic_sync_from_vapic
> >> 7586      0.3415  qemu-system-x86_64       main_loop_wait
> >> 7480      0.3367  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 do_select
> >> 7121      0.3206  qemu-system-x86_64       lduw_phys
> >> 7003      0.3153  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_exit
> >> 6062      0.2729  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 kfree
> >> 5477      0.2466  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fput
> >> 5454      0.2455  kvm.ko                   kvm_lapic_get_cr8
> >> 5096      0.2294  kvm.ko                   kvm_load_guest_fpu
> >> 5057      0.2277  kvm.ko                   apic_update_ppr
> >> 4929      0.2219  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 up_read
> >> 4900      0.2206  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_entry
> >> 4866      0.2191  kvm.ko                   kvm_apic_has_interrupt
> >> 4670      0.2102  kvm-intel.ko             skip_emulated_instruction
> >> 4644      0.2091  kvm.ko                   kvm_cpu_has_interrupt
> >> 4548      0.2047  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __switch_to
> >> 4328      0.1948  kvm.ko                   kvm_apic_accept_pic_intr
> >> 4303      0.1937  libpthread-2.5.so        pthread_mutex_lock
> >> 4235      0.1906  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 system_call
> >> 4175      0.1879  kvm.ko                   kvm_put_guest_fpu
> >> 4170      0.1877  qemu-system-x86_64       ldl_phys
> >> 4098      0.1845  kvm-intel.ko             vmx_set_interrupt_shadow
> >> 4003      0.1802  qemu-system-x86_64       kvm_run
> >>      
> > I was wondering why the get/set debugreg was so high.  I don't recall
> > seeing this much with Linux VMs.
> >    
> 
> Could it be that Windows uses the debug registers?  Maybe we're 
> incorrectly deciding to switch them.

I was wondering about that.  I was thinking of just backing out the
support for debugregs and see what happens.

Did the up/down_read seem kind of high?  Are we doing a lock of locking?

> 
> Apart from that, nothing really stands out.  We'll just have to optimize 
> things one by one.
> 
> > Here is an average of kvm_stat:
> >
> >
> >    
> >> efer_relo  0
> >> exits      1262814
> >>      
> 
> 100K exits/sec/vm.  This is high.
> 
> >> fpu_reloa  103842
> >>      
> 
> So is this -- maybe we're misdetecting fpu usage on EPT.
> 
> >> halt_exit  9918
> >> halt_wake  9763
> >> host_stat  103846
> >>      
> 
> This is presumably due to virtio in qemu.
> 
> >> hypercall  0
> >> insn_emul  23277
> >> insn_emul  23277
> >> invlpg     0
> >> io_exits   82717
> >>      
> 
> Yes, it is.
> 
> >> irq_exits  12797
> >> irq_injec  18806
> >> irq_windo  1194
> >> largepage  12
> >> mmio_exit  0
> >> mmu_cache  0
> >> mmu_flood  0
> >> mmu_pde_z  0
> >> mmu_pte_u  0
> >> mmu_pte_w  0
> >> mmu_recyc  0
> >> mmu_shado  0
> >> mmu_unsyn  0
> >> nmi_injec  0
> >> nmi_windo  0
> >> pf_fixed   12
> >> pf_guest   0
> >> remote_tl  0
> >> request_i  0
> >> signal_ex  0
> >> tlb_flush  0
> >>      
> > For 12 VMs, do the number of exits/sec seem reasonable?
> >
> > Comments?
> >    
> 
> Not all of the exits are accounted for, so we're missing a big part of 
> the picture.  2.6.32 will have better statistics through ftrace.

Thanks for the comments!

-Andrew

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux