On Tue, 7 Feb 2012, Igor Mammedov wrote: > When kvm guest uses kvmclock, it may hang on vcpu hot-plug. > This is caused by an overflow in pvclock_get_nsec_offset, > > u64 delta = tsc - shadow->tsc_timestamp; > > which in turn is caused by an undefined values from percpu > hv_clock that hasn't been initialized yet. > Uninitialized clock on being booted cpu is accessed from > start_secondary > -> smp_callin > -> smp_store_cpu_info > -> identify_secondary_cpu > -> mtrr_ap_init > -> mtrr_restore > -> stop_machine_from_inactive_cpu > -> queue_stop_cpus_work > ... > -> sched_clock > -> kvm_clock_read > which is well before x86_cpuinit.setup_percpu_clockev call in > start_secondary, where percpu clock is initialized. > > This patch introduces a hook that allows to setup/initialize > per_cpu clock early and avoid overflow due to reading > - undefined values > - old values if cpu was offlined and then onlined again > > Another possible early user of this clock source is ftrace that > accesses it to get timestamps for ring buffer entries. So if > mtrr_ap_init is moved from identify_secondary_cpu to past > x86_cpuinit.setup_percpu_clockev in start_secondary, ftrace > may cause the same overflow/hang on cpu hot-plug anyway. > > More complete description of the problem: > https://lkml.org/lkml/2012/2/2/101 > > Credits to Marcelo Tosatti <mtosatti@xxxxxxxxxx> for hook idea. > > Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx> > --- > arch/x86/include/asm/x86_init.h | 2 ++ > arch/x86/kernel/kvmclock.c | 4 +--- > arch/x86/kernel/smpboot.c | 1 + > arch/x86/kernel/x86_init.c | 1 + > 4 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h > index 517d476..5d0afac 100644 > --- a/arch/x86/include/asm/x86_init.h > +++ b/arch/x86/include/asm/x86_init.h > @@ -145,9 +145,11 @@ struct x86_init_ops { > /** > * struct x86_cpuinit_ops - platform specific cpu hotplug setups > * @setup_percpu_clockev: set up the per cpu clock event device > + * @early_percpu_clock_init: early init of the per cpu clock event device You initialize the per cpu clock, not the per cpu clock event device. The latter is still initialized via setup_percpu_clockev(). Otherwise Acked-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html