Hi, So, nobody has any idea what's going wrong with all these massive IRQs and spin_locks that cause virtual machines to almost completely stop? :( Thanks, Dmitry On Wed, Dec 1, 2010 at 5:38 AM, Dmitry Golubev <lastguru@xxxxxxxxx> wrote: > Hi, > > Sorry it took so slow to reply you - there are only few moments when I > can poke a production server and I need to notify people in advance > about that :( > >> Can you post kvm_stat output while slowness is happening? 'perf top' on the host? and on the guest? > > I took 'perf top' and first thing I saw is that while guest is on > acpi_pm, it shows more or less normal amount of IRQs (under 1000/s), > however when I switched back to the default (which is nohz with > kvm_clock), there are 40 times (!!!) more IRQs under normal operation > (about 40 000/s). When the slowdown is happening, there are a lot of > _spin_lock events and a lot of messages like: "WARNING: failed to keep > up with mmap data. Last read 810 msecs ago." > > As I told before, switching to acpi_pm does not save the day, but > makes situation a lot more workable (i.e., servers recover faster from > the period of slowness). During slowdowns on acpi_pm I also see > "_spin_lock" > > Raw data follows: > > > > vmstat -5 on the host: > > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 131904 13952 205872 0 0 0 24 2495 9813 6 3 91 0 > 0 0 0 132984 13952 205872 0 0 0 47 2596 9851 5 3 91 1 > 1 0 0 132148 13952 205872 0 0 0 54 2644 10559 3 3 93 1 > 0 1 0 129084 13952 205872 0 0 0 38 3039 9752 7 3 87 2 > 6 0 0 126388 13952 205872 0 0 0 311 15619 9009 42 17 39 2 > 9 0 0 125868 13960 205872 0 0 6 86 4659 6504 98 2 0 0 > 8 0 0 123320 13960 205872 0 0 0 26 4682 6649 98 2 0 0 > 8 0 0 126252 13960 205872 0 0 0 124 4923 6776 98 2 0 0 > 8 0 0 125376 13960 205872 0 0 136 11 4287 5865 98 2 0 0 > 9 0 0 123812 13960 205872 0 0 205 51 4497 6134 98 2 0 0 > 8 0 0 126020 13960 205872 0 0 904 26 4483 5999 98 2 0 0 > 8 0 0 124052 13960 205872 0 0 15 10 4397 6200 98 2 0 0 > 8 0 0 125928 13960 205872 0 0 14 41 4335 5823 98 2 0 0 > 8 0 0 126184 13960 205872 0 0 6 14 4966 6588 98 2 0 0 > 8 0 0 123588 13960 205872 0 0 143 18 5234 6891 98 2 0 0 > 8 0 0 126640 13960 205872 0 0 6 91 5554 7334 98 2 0 0 > 8 0 0 123144 13960 205872 0 0 146 11 5235 7145 98 2 0 0 > 8 0 0 125856 13968 205872 0 0 1282 98 5481 7159 98 2 0 0 > 9 19 0 124124 13968 205872 0 0 782 2433 8587 8987 97 3 0 0 > 8 0 0 122584 13968 205872 0 0 432 90 5359 6960 98 2 0 0 > 8 0 0 125320 13968 205872 0 0 3074 52 5448 7095 97 3 0 0 > 8 0 0 121436 13968 205872 0 0 2519 81 5714 7279 98 2 0 0 > 8 0 0 124436 13968 205872 0 0 1 56 5242 6864 98 2 0 0 > 8 0 0 111324 13968 205872 0 0 2 22 10660 6686 97 3 0 0 > 8 0 0 107824 13968 205872 0 0 0 24 14329 8147 97 3 0 0 > 8 0 0 110420 13968 205872 0 0 0 68 13486 6985 98 2 0 0 > 8 0 0 110024 13968 205872 0 0 0 19 13085 6659 98 2 0 0 > 8 0 0 109932 13968 205872 0 0 0 3 12952 6415 98 2 0 0 > 8 0 0 108552 13968 205880 0 0 2 41 13400 7349 98 2 0 0 > > Few shots with kvm_stat on the host: > > Every 2.0s: kvm_stat -1 > > Wed Dec 1 04:45:47 2010 > > efer_reload 0 0 > exits 56264102 14074 > fpu_reload 311506 50 > halt_exits 4733166 935 > halt_wakeup 3845079 840 > host_state_reload 8795964 4085 > hypercalls 0 0 > insn_emulation 13573212 7249 > insn_emulation_fail 0 0 > invlpg 1846050 20 > io_exits 3579406 843 > irq_exits 3038887 4879 > irq_injections 5242157 3681 > irq_window 124361 540 > largepages 2253 0 > mmio_exits 64274 20 > mmu_cache_miss 664011 16 > mmu_flooded 164506 1 > mmu_pde_zapped 212686 8 > mmu_pte_updated 729268 0 > mmu_pte_write 81323616 551 > mmu_recycled 277 0 > mmu_shadow_zapped 652691 23 > mmu_unsync 5630 8 > nmi_injections 0 0 > nmi_window 0 0 > pf_fixed 17470658 218 > pf_guest 13352205 81 > remote_tlb_flush 1898930 96 > request_irq 0 0 > signal_exits 0 0 > tlb_flush 5827433 108 > > Every 2.0s: kvm_stat -1 > > Wed Dec 1 04:47:33 2010 > > efer_reload 0 0 > exits 58155746 18954 > fpu_reload 318003 61 > halt_exits 4839340 1082 > halt_wakeup 3940964 984 > host_state_reload 9267420 4803 > hypercalls 0 0 > insn_emulation 14376685 7721 > insn_emulation_fail 0 0 > invlpg 1855758 13 > io_exits 3676471 993 > irq_exits 3609310 5363 > irq_injections 5648007 3922 > irq_window 181397 517 > largepages 2253 0 > mmio_exits 65862 14 > mmu_cache_miss 666017 2 > mmu_flooded 164784 0 > mmu_pde_zapped 213208 1 > mmu_pte_updated 731301 1 > mmu_pte_write 81455666 14 > mmu_recycled 277 0 > mmu_shadow_zapped 653910 1 > mmu_unsync 5461 -2 > nmi_injections 0 0 > nmi_window 0 0 > pf_fixed 17530360 154 > pf_guest 13388143 89 > remote_tlb_flush 1915787 39 > request_irq 0 0 > signal_exits 0 0 > tlb_flush 5857307 68 > > Every 2.0s: kvm_stat -1 > > Wed Dec 1 04:47:46 2010 > > efer_reload 0 0 > exits 58382002 14542 > fpu_reload 318544 32 > halt_exits 4850647 541 > halt_wakeup 3951056 441 > host_state_reload 9317479 2669 > hypercalls 0 0 > insn_emulation 14464287 5075 > insn_emulation_fail 0 0 > invlpg 1856481 34 > io_exits 3686632 456 > irq_exits 3670192 3742 > irq_injections 5692201 2471 > irq_window 186987 126 > largepages 2253 0 > mmio_exits 65981 6 > mmu_cache_miss 666184 14 > mmu_flooded 164819 1 > mmu_pde_zapped 213264 5 > mmu_pte_updated 731432 2 > mmu_pte_write 81473978 563 > mmu_recycled 277 0 > mmu_shadow_zapped 654130 10 > mmu_unsync 5410 4 > nmi_injections 0 0 > nmi_window 0 0 > pf_fixed 17536667 653 > pf_guest 13391345 300 > remote_tlb_flush 1917634 120 > request_irq 0 0 > signal_exits 0 0 > tlb_flush 5860221 205 > > > 'perf top' on the host: > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > PerfTop: 3894 irqs/sec kernel: 2.4% exact: 0.0% [1000Hz > cycles], (all, 4 CPUs) > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > samples pcnt function DSO > _______ _____ ________________________ > ______________________________________________________________ > > 109.00 15.1% vmx_vcpu_run > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko > 64.00 8.9% copy_user_generic_string [kernel.kallsyms] > 61.00 8.4% __ticket_spin_lock [kernel.kallsyms] > 47.00 6.5% vcpu_enter_guest > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko > 23.00 3.2% schedule [kernel.kallsyms] > 18.00 2.5% kvm_read_guest > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko > 18.00 2.5% gfn_to_hva > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko > 17.00 2.4% native_write_msr_safe [kernel.kallsyms] > 17.00 2.4% x86_decode_insn > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko > 13.00 1.8% rtl8169_interrupt > /lib/modules/2.6.35-23-server/kernel/drivers/net/r8169.ko > 13.00 1.8% native_read_msr_safe [kernel.kallsyms] > 11.00 1.5% paging64_walk_addr > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko > 10.00 1.4% vmcs_writel > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko > 10.00 1.4% PyEval_EvalFrameEx /usr/bin/python2.6 > 10.00 1.4% update_curr [kernel.kallsyms] > 10.00 1.4% x86_emulate_insn > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko > 9.00 1.2% vmx_get_cpl > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko > 9.00 1.2% __srcu_read_lock [kernel.kallsyms] > 9.00 1.2% emulate_instruction > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko > 9.00 1.2% __memcpy [kernel.kallsyms] > 9.00 1.2% handle_exception > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko > 8.00 1.1% vmx_complete_interrupts > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko > 8.00 1.1% native_read_tsc [kernel.kallsyms] > 8.00 1.1% __vcpu_run > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko > 7.00 1.0% fput [kernel.kallsyms] > 7.00 1.0% __switch_to [kernel.kallsyms] > 6.00 0.8% vmx_cache_reg > /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko > 6.00 0.8% _raw_spin_lock_irqsave [kernel.kallsyms] > > > 'perf top' on guest: > > ------------------------------------------------------------------------------ > PerfTop: 13547 irqs/sec kernel:99.4% [100000 cpu-clock-msecs], > (all, 4 CPUs) > ------------------------------------------------------------------------------ > > samples pcnt kernel function > _______ _____ _______________ > > 119287.00 - 94.8% : _spin_lock > 1108.00 - 0.9% : do_page_fault > 599.00 - 0.5% : _spin_unlock_irqrestore > 512.00 - 0.4% : finish_task_switch > 432.00 - 0.3% : clear_page_c > 361.00 - 0.3% : __do_softirq > 214.00 - 0.2% : native_flush_tlb > 158.00 - 0.1% : native_set_pte_at > 147.00 - 0.1% : flush_tlb_page > 141.00 - 0.1% : retint_careful > 101.00 - 0.1% : kmem_cache_alloc > 95.00 - 0.1% : get_page_from_freelist > 94.00 - 0.1% : unmap_vmas > 82.00 - 0.1% : native_set_pmd > 74.00 - 0.1% : virtnet_poll [virtio_net] > WARNING: failed to keep up with mmap data. Last read 810 msecs ago. > WARNING: failed to keep up with mmap data. Last read 0 msecs ago. > WARNING: failed to keep up with mmap data. Last read 0 msecs ago. > WARNING: failed to keep up with mmap data. Last read 0 msecs ago. > > > 'perf top' on the guest under normal operation (nohz, clocksource=kvm_clock): > > ------------------------------------------------------------------------------ > PerfTop: 39501 irqs/sec kernel:99.8% [100000 cpu-clock-msecs], > (all, 4 CPUs) > ------------------------------------------------------------------------------ > > samples pcnt kernel function > _______ _____ _______________ > > 256987.00 - 98.7% : native_safe_halt > 377.00 - 0.1% : __do_softirq > 332.00 - 0.1% : finish_task_switch > 257.00 - 0.1% : do_page_fault > 201.00 - 0.1% : _spin_unlock_irqrestore > 121.00 - 0.0% : tick_nohz_stop_sched_tick > 113.00 - 0.0% : pvclock_clocksource_read > 106.00 - 0.0% : flush_tlb_page > 99.00 - 0.0% : tick_nohz_restart_sched_tick > 59.00 - 0.0% : system_call_after_swapgs > 55.00 - 0.0% : fget_light > 52.00 - 0.0% : kmem_cache_alloc > 52.00 - 0.0% : do_sys_poll > 50.00 - 0.0% : perf_poll > 47.00 - 0.0% : native_flush_tlb > > 'perf top' on the guest under normal operation (with > clocksource=acpi_pm nohz=off highres=off): > > ------------------------------------------------------------------------------ > PerfTop: 949 irqs/sec kernel:89.3% [100000 cpu-clock-msecs], > (all, 4 CPUs) > ------------------------------------------------------------------------------ > > samples pcnt kernel function > _______ _____ _______________ > > 10196.00 - 92.3% : native_safe_halt > 145.00 - 1.3% : clear_page_c > 139.00 - 1.3% : do_page_fault > 100.00 - 0.9% : acpi_pm_read > 52.00 - 0.5% : flush_tlb_page > 49.00 - 0.4% : finish_task_switch > 32.00 - 0.3% : native_flush_tlb > 27.00 - 0.2% : __do_softirq > 24.00 - 0.2% : _spin_unlock_irqrestore > 22.00 - 0.2% : native_set_pmd > 20.00 - 0.2% : native_set_pte_at > 19.00 - 0.2% : _spin_lock > 18.00 - 0.2% : generic_unplug_device > 17.00 - 0.2% : tick_nohz_stop_sched_tick > 13.00 - 0.1% : unmap_vmas > > > 'perf top' on the guest when slowdown is happening (with > clocksource=acpi_pm nohz=off highres=off): > > ------------------------------------------------------------------------------ > PerfTop: 966 irqs/sec kernel:94.0% [100000 cpu-clock-msecs], > (all, 4 CPUs) > ------------------------------------------------------------------------------ > > samples pcnt kernel function > _______ _____ _______________ > > 10799.00 - 91.2% : native_safe_halt > 514.00 - 4.3% : _spin_lock > 115.00 - 1.0% : clear_page_c > 87.00 - 0.7% : do_page_fault > 68.00 - 0.6% : __do_softirq > 49.00 - 0.4% : acpi_pm_read > 44.00 - 0.4% : _spin_unlock_irqrestore > 39.00 - 0.3% : finish_task_switch > 23.00 - 0.2% : flush_tlb_page > 21.00 - 0.2% : native_flush_tlb > 8.00 - 0.1% : native_set_pmd > 8.00 - 0.1% : native_set_pte_at > 7.00 - 0.1% : tick_nohz_restart_sched_tick > 6.00 - 0.1% : tick_nohz_stop_sched_tick > > > > Thanks, > Dmitry > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html