Rik van Riel wrote: > Richard Davies wrote: > > Avi Kivity wrote: > > > Richard Davies wrote: > > > > I can trigger the slow boots without KSM and they have the same > > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at > > > > the top. > > > > > > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB > > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5 > > > > minutes). I'll post agan when I get one. > > > > > > I think you can go higher than that. But 120GB on a 128GB host is > > > pushing it. > > > > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host > > (i.e. 108GB on a 128GB host). > > > > It has the same profile with _raw_spin_lock_irqsave and > > isolate_freepages_block at the top. > > That's the page compaction code. > > Mel Gorman and I have been working to fix that, > the latest fixes and improvements are in the -mm > kernel already. Hi Rik, Are you talking about these patches? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 http://marc.info/?l=linux-mm&m=134521289221259 If so, I believe those are in 3.6.0-rc3, so I tested with that. Unfortunately, I can still get the slow boots and perf top showing _raw_spin_lock_irqsave. Here are two perf top traces on 3.6.0-rc3. They do look a bit different from 3.5.2, but _raw_spin_lock_irqsave is still at the top: PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) ------------------------------------------------------------------------------------------------------------------ 61.85% [kernel] [k] _raw_spin_lock_irqsave 7.18% [kernel] [k] sub_preempt_count 5.03% [kernel] [k] isolate_freepages_block 2.49% [kernel] [k] yield_to 2.05% [kernel] [k] memcmp 2.01% [kernel] [k] compact_zone 1.76% [kernel] [k] add_preempt_count 1.52% [kernel] [k] _raw_spin_lock 1.31% [kernel] [k] kvm_vcpu_on_spin 0.92% [kernel] [k] svm_vcpu_run 0.78% [kernel] [k] __rcu_read_unlock 0.76% [kernel] [k] migrate_pages 0.68% [kernel] [k] kvm_vcpu_yield_to 0.46% [kernel] [k] pid_task 0.42% [kernel] [k] isolate_migratepages_range 0.41% [kernel] [k] kvm_arch_vcpu_ioctl_run 0.40% [kernel] [k] clear_page_c 0.40% [kernel] [k] get_pid_task 0.40% [kernel] [k] get_parent_ip 0.39% [kernel] [k] __zone_watermark_ok 0.34% [kernel] [k] trace_hardirqs_off 0.34% [kernel] [k] trace_hardirqs_on 0.32% [kernel] [k] _raw_spin_unlock_irqrestore 0.27% [kernel] [k] _raw_spin_unlock 0.22% [kernel] [k] mod_zone_page_state 0.21% [kernel] [k] rcu_note_context_switch 0.21% [kernel] [k] trace_preempt_on 0.21% [kernel] [k] trace_preempt_off 0.19% [kernel] [k] in_lock_functions 0.16% [kernel] [k] __srcu_read_lock 0.14% [kernel] [k] ktime_get 0.11% [kernel] [k] get_pageblock_flags_group 0.11% [kernel] [k] compact_checklock_irqsave 0.11% [kernel] [k] find_busiest_group 0.10% [kernel] [k] __srcu_read_unlock 0.09% [kernel] [k] __rcu_read_lock 0.09% libc-2.10.1.so [.] 0x0000000000072c9d 0.09% [kernel] [k] cpumask_next_and 0.08% [kernel] [k] smp_call_function_many 0.08% [kernel] [k] read_tsc 0.08% [kernel] [k] kmem_cache_alloc 0.08% libc-2.10.1.so [.] strcmp 0.08% [kernel] [k] generic_smp_call_function_interrupt 0.07% [kernel] [k] __schedule 0.07% qemu-kvm [.] main_loop_wait 0.07% [kernel] [k] __hrtimer_start_range_ns 0.06% qemu-kvm [.] qemu_iohandler_poll 0.06% [kernel] [k] ktime_get_update_offsets 0.06% [kernel] [k] ktime_add_safe 0.06% [kernel] [k] find_next_bit 0.06% [kernel] [k] irq_exit 0.06% [kernel] [k] select_task_rq_fair 0.06% [kernel] [k] handle_exit 0.05% [kernel] [k] update_curr 0.05% [kernel] [k] flush_tlb_func 0.05% perf [.] dso__find_symbol 0.05% [kernel] [k] kvm_check_async_pf_completion 0.05% [kernel] [k] rcu_check_callbacks 0.05% [kernel] [k] apic_update_ppr 0.05% [kernel] [k] irq_enter 0.04% [kernel] [k] copy_user_generic_string 0.04% [kernel] [k] copy_page_c 0.04% [kernel] [k] rcu_idle_exit_common.isra.34 0.04% [kernel] [k] load_balance 0.04% [kernel] [k] rb_erase 0.04% libc-2.10.1.so [.] __select 1904 unprocessable samples recorded.1905 unprocessable samples recorded. ... PerfTop: 49639 irqs/sec kernel:98.8% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) ------------------------------------------------------------------------------------------------------------------ 81.43% [kernel] [k] _raw_spin_lock_irqsave 6.19% [kernel] [k] sub_preempt_count 1.21% [kernel] [k] memcmp 1.03% [kernel] [k] compact_zone 0.72% [kernel] [k] smp_call_function_many 0.50% [kernel] [k] yield_to 0.49% [kernel] [k] add_preempt_count 0.43% [kernel] [k] svm_vcpu_run 0.41% [kernel] [k] _raw_spin_unlock_irqrestore 0.40% [kernel] [k] clear_page_c 0.40% [kernel] [k] migrate_pages 0.38% [kernel] [k] __zone_watermark_ok 0.34% [kernel] [k] isolate_migratepages_range 0.34% [kernel] [k] isolate_freepages_block 0.27% [kernel] [k] kvm_vcpu_on_spin 0.23% [kernel] [k] trace_hardirqs_off 0.21% [kernel] [k] mod_zone_page_state 0.20% [kernel] [k] __rcu_read_unlock 0.18% [kernel] [k] get_parent_ip 0.17% [kernel] [k] _raw_spin_lock 0.14% [kernel] [k] flush_tlb_func 0.14% [kernel] [k] trace_preempt_on 0.14% [kernel] [k] trace_preempt_off 0.14% [kernel] [k] kvm_arch_vcpu_ioctl_run 0.14% [kernel] [k] trace_hardirqs_on 0.10% [kernel] [k] compact_checklock_irqsave 0.09% [kernel] [k] _raw_spin_lock_irq 0.09% [kernel] [k] __srcu_read_lock 0.07% [kernel] [k] in_lock_functions 0.07% [kernel] [k] copy_page_c 0.07% [kernel] [k] kmem_cache_alloc 0.07% libc-2.10.1.so [.] strcmp 0.06% [kernel] [k] _raw_spin_unlock 0.06% [kernel] [k] kvm_vcpu_yield_to 0.06% [kernel] [k] get_pid_task 0.06% [kernel] [k] ktime_get 0.06% [kernel] [k] call_function_interrupt 0.05% [kernel] [k] generic_smp_call_function_interrupt 0.05% [kernel] [k] ktime_get_update_offsets 0.05% [kernel] [k] pid_task 0.05% [kernel] [k] copy_user_generic_string 0.04% [kernel] [k] __srcu_read_unlock 0.04% [kernel] [k] get_pageblock_flags_group 0.04% [kernel] [k] rcu_note_context_switch 0.04% libc-2.10.1.so [.] 0x00000000000743ee 0.04% perf [.] dso__find_symbol 0.04% [kernel] [k] zone_watermark_ok 0.04% [vdso] [.] 0x00007fff9afff85d 0.03% [kernel] [k] __mod_zone_page_state 0.03% [kernel] [k] smp_call_function_interrupt 0.03% [kernel] [k] _cond_resched 0.03% [kernel] [k] read_tsc 0.03% [kernel] [k] sysret_check 0.03% [kernel] [k] system_call_after_swapgs 0.03% [kernel] [k] default_send_IPI_mask_sequence_phys 0.03% perf [.] add_hist_entry 0.03% [kernel] [k] __schedule 0.03% perf [.] sort__dso_cmp 0.02% [kernel] [k] mutex_spin_on_owner 0.02% [kernel] [k] do_select 0.02% [kernel] [k] __rcu_read_lock 0.02% [kernel] [k] rcu_check_callbacks 0.02% [kernel] [k] handle_exit 0.02% [kernel] [k] apic_timer_interrupt 0.02% [kernel] [k] perf_pmu_disable 0.02% [kernel] [k] find_busiest_group 3665 unprocessable samples recorded.3666 unprocessable samples recorded. ... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html