Avi Kivity wrote: > Richard Davies wrote: > > We're running host kernel 3.5.1 and qemu-kvm 1.1.1. > > > > I hadn't though about it, but I agree this is related to cpu overcommit. The > > slow boots are intermittent (and infrequent) with cpu overcommit whereas I > > don't think it occurs without cpu overcommit. > > > > In addition, if there is a slow boot ongoing, and you kill some other VMs to > > reduce cpu overcommit then this will sometimes speed it up. > > > > I guess the question is why even with overcommit most boots are fine, but > > some small fraction then go slow? > > Could be a bug. The scheduler and the spin-loop handling code fight > each other instead of working well. > > Please provide snapshots of 'perf top' while a slow boot is in progress. Below are two 'perf top' snapshots during a slow boot, which appear to me to support your idea of a spin-lock problem. There are a lot more "unprocessable samples recorded" messages at the end of each snapshot which I haven't included. I think these may be from the guest OS - the kernel is listed, and qemu-kvm itself is listed on some other traces which I did, although not these. Richard. PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) -------------------------------------------------------------------------------------------------------------------------------- 35.80% [kernel] [k] _raw_spin_lock_irqsave 21.64% [kernel] [k] isolate_freepages_block 5.91% [kernel] [k] yield_to 4.95% [kernel] [k] _raw_spin_lock 3.37% [kernel] [k] kvm_vcpu_on_spin 2.74% [kernel] [k] add_preempt_count 2.45% [kernel] [k] _raw_spin_unlock 2.33% [kernel] [k] sub_preempt_count 2.18% [kernel] [k] svm_vcpu_run 2.17% [kernel] [k] kvm_vcpu_yield_to 1.89% [kernel] [k] memcmp 1.50% [kernel] [k] get_pid_task 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.16% [kernel] [k] pid_task 0.70% [kernel] [k] rcu_note_context_switch 0.70% [kernel] [k] trace_hardirqs_on 0.52% [kernel] [k] __rcu_read_unlock 0.51% [kernel] [k] trace_preempt_on 0.47% [kernel] [k] __srcu_read_lock 0.43% [kernel] [k] get_parent_ip 0.42% [kernel] [k] get_pageblock_flags_group 0.38% [kernel] [k] in_lock_functions 0.34% [kernel] [k] trace_preempt_off 0.34% [kernel] [k] trace_hardirqs_off 0.29% [kernel] [k] clear_page_c 0.23% [kernel] [k] __srcu_read_unlock 0.20% [kernel] [k] __rcu_read_lock 0.14% [kernel] [k] handle_exit 0.11% libc-2.10.1.so [.] strcmp 0.11% [kernel] [k] _raw_spin_unlock_irqrestore 0.11% [kernel] [k] _raw_spin_lock_irq 0.11% [kernel] [k] find_highest_vector 0.09% [kernel] [k] ktime_get 0.08% [kernel] [k] copy_page_c 0.08% [kernel] [k] pause_interception 0.08% [kernel] [k] kmem_cache_alloc 0.08% [kernel] [k] resched_task 0.08% perf [.] dso__find_symbol 0.06% [kernel] [k] compaction_alloc 0.06% libc-2.10.1.so [.] 0x0000000000076dab 0.06% [kernel] [k] read_tsc 0.06% perf [.] add_hist_entry 0.05% [kernel] [k] svm_read_l1_tsc 0.05% [kernel] [k] native_read_tsc 0.05% perf [.] sort__dso_cmp 0.05% [kernel] [k] copy_user_generic_string 0.05% [kernel] [k] ktime_get_update_offsets 0.04% [kernel] [k] kvm_check_async_pf_completion 0.04% [kernel] [k] __schedule 0.04% [kernel] [k] __rcu_pending 0.04% [kernel] [k] svm_complete_interrupts 0.04% [kernel] [k] perf_pmu_disable 0.04% [kernel] [k] isolate_migratepages_range 0.04% [kernel] [k] sched_clock_cpu 0.04% [kernel] [k] kvm_cpu_has_pending_timer 0.04% [kernel] [k] apic_timer_interrupt 0.04% [vdso] [.] 0x00007fff2e1ff607 0.04% [kernel] [k] apic_update_ppr 0.04% [kernel] [k] do_select 0.04% [kernel] [k] svm_scale_tsc 0.04% [kernel] [k] system_call_after_swapgs 0.03% [kernel] [k] kvm_lapic_get_cr8 0.03% perf [.] sort__sym_cmp 0.03% [kernel] [k] find_next_bit 0.03% [kernel] [k] kvm_set_cr8 0.03% [kernel] [k] rcu_check_callbacks 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded. PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) -------------------------------------------------------------------------------------------------------------------------------- 36.73% [kernel] [k] _raw_spin_lock_irqsave 19.00% [kernel] [k] isolate_freepages_block 5.80% [kernel] [k] yield_to 5.23% [kernel] [k] _raw_spin_lock 3.97% [kernel] [k] kvm_vcpu_on_spin 2.98% [kernel] [k] add_preempt_count 2.45% [kernel] [k] sub_preempt_count 2.37% [kernel] [k] _raw_spin_unlock 2.22% [kernel] [k] svm_vcpu_run 2.19% [kernel] [k] kvm_vcpu_yield_to 1.90% [kernel] [k] memcmp 1.54% [kernel] [k] get_pid_task 1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.30% [kernel] [k] pid_task 0.75% [kernel] [k] rcu_note_context_switch 0.74% [kernel] [k] trace_hardirqs_on 0.58% [kernel] [k] __rcu_read_unlock 0.55% [kernel] [k] trace_preempt_on 0.47% [kernel] [k] __srcu_read_lock 0.44% [kernel] [k] get_parent_ip 0.41% [kernel] [k] clear_page_c 0.40% [kernel] [k] get_pageblock_flags_group 0.39% [kernel] [k] in_lock_functions 0.36% [kernel] [k] trace_preempt_off 0.35% [kernel] [k] trace_hardirqs_off 0.23% [kernel] [k] __srcu_read_unlock 0.20% [kernel] [k] __rcu_read_lock 0.15% [kernel] [k] _raw_spin_lock_irq 0.14% [kernel] [k] handle_exit 0.12% [kernel] [k] find_highest_vector 0.11% [kernel] [k] resched_task 0.10% libc-2.10.1.so [.] strcmp 0.09% [kernel] [k] _raw_spin_unlock_irqrestore 0.09% [kernel] [k] ktime_get 0.08% [kernel] [k] pause_interception 0.08% [kernel] [k] copy_page_c 0.07% [kernel] [k] __schedule 0.07% [kernel] [k] compact_zone 0.07% perf [.] dso__find_symbol 0.06% perf [.] add_hist_entry 0.06% [kernel] [k] read_tsc 0.06% [kernel] [k] svm_read_l1_tsc 0.05% [kernel] [k] native_read_tsc 0.05% [kernel] [k] ktime_get_update_offsets 0.05% [kernel] [k] compaction_alloc 0.05% libc-2.10.1.so [.] 0x0000000000073ae0 0.05% [kernel] [k] kmem_cache_alloc 0.05% [kernel] [k] svm_complete_interrupts 0.05% [kernel] [k] kvm_check_async_pf_completion 0.05% [kernel] [k] apic_timer_interrupt 0.05% perf [.] sort__dso_cmp 0.05% [kernel] [k] kvm_cpu_has_pending_timer 0.04% [kernel] [k] svm_scale_tsc 0.04% [kernel] [k] isolate_migratepages_range 0.04% [kernel] [k] sched_clock_cpu 0.04% [kernel] [k] __rcu_pending 0.04% [kernel] [k] apic_update_ppr 0.04% [kernel] [k] do_select 0.04% [kernel] [k] perf_pmu_disable 0.04% [kernel] [k] kvm_set_cr8 0.04% [kernel] [k] update_curr 0.04% [kernel] [k] reschedule_interrupt 0.03% [kernel] [k] kvm_lapic_get_cr8 0.03% libc-2.10.1.so [.] strstr 0.03% [kernel] [k] apic_has_pending_timer 0.03% perf [.] sort__sym_cmp 4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html