I've to revoke my ack and say NAK to this patch. Patch itself is in right direction but clock restore happens too late. With patch I've used to hunt down overflow for cpu hotplug (maybe it will be better for it to be in kernel, which will help to detect overflow problem without spending a lot of time to debug it?): diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 42eb330..0081e10 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -43,7 +43,10 @@ void pvclock_set_flags(u8 flags) static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) { - u64 delta = native_read_tsc() - shadow->tsc_timestamp; + u64 tsc = native_read_tsc(); + u64 shadow_timestamp = shadow->tsc_timestamp; + u64 delta = tsc - shadow_timestamp; + BUG_ON (tsc < shadow_timestamp); return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift); } I've got following back-trace: #29 0xffffffff81004925 in oops_end (flags=70, regs=<optimized out>, signr=11) at arch/x86/kernel/dumpstack.c:246 #30 0xffffffff81004a77 in die (str=0xffffffff8152ca7f "invalid opcode", regs=0xffff8800149859e8, err=0) at arch/x86/kernel/dumpstack.c:305 #31 0xffffffff81001fed in do_trap (trapnr=6, signr=4, str=0xffffffff8152ca7f "invalid opcode", regs=0xffff8800149859e8, error_code=0, info=0xffff880014985948) at arch/x86/kernel/traps.c:168 #32 0xffffffff810021f7 in do_invalid_op (regs=0xffff8800149859e8, error_code=0) at arch/x86/kernel/traps.c:209 #33 <signal handler called> #34 pvclock_get_nsec_offset (shadow=0xffff880014985a98) at arch/x86/kernel/pvclock.c:51 #35 pvclock_clocksource_read (src=0xffff88001f1d1ac0) at arch/x86/kernel/pvclock.c:107 #36 0xffffffff8101aa97 in kvm_clock_read () at arch/x86/kernel/kvmclock.c:79 #37 0xffffffff810087cc in paravirt_sched_clock () at /home/imammedov/fc15/linux-2.6/arch/x86/include/asm/paravirt.h:230 #38 sched_clock () at arch/x86/kernel/tsc.c:71 #39 0xffffffff8104f025 in sched_clock_local (scd=0xffff88001f1d2600) at kernel/sched/clock.c:147 #40 0xffffffff8104f1b7 in sched_clock_cpu (cpu=0) at kernel/sched/clock.c:232 #41 0xffffffff8104f1e5 in local_clock () at kernel/sched/clock.c:316 #42 0xffffffff810699d2 in lockstat_clock () at kernel/lockdep.c:158 #43 __lock_acquire (lock=0xffffffff816131b8, subclass=<optimized out>, trylock=0, read=0, check=2, hardirqs_off=<optimized out>, nest_lock=0x0, ip=18446744071578924098, references=0) at kernel/lockdep.c:3098 #44 0xffffffff8106a924 in lock_acquire (lock=0xffffffff816131b8, subclass=0, trylock=0, read=0, check=2, nest_lock=0x0, ip=18446744071578924098) at kernel/lockdep.c:3555 #45 0xffffffff8136b1c3 in __raw_spin_lock (lock=0xffffffff816131a0) at include/linux/spinlock_api_smp.h:143 #46 _raw_spin_lock (lock=0xffffffff816131a0) at kernel/spinlock.c:137 #47 0xffffffff81013442 in prepare_set () at arch/x86/kernel/cpu/mtrr/generic.c:676 #48 0xffffffff8101368c in generic_set_all () at arch/x86/kernel/cpu/mtrr/generic.c:723 #49 0xffffffff8101258b in mtrr_bp_restore () at arch/x86/kernel/cpu/mtrr/main.c:738 #50 0xffffffff812cabfa in __restore_processor_state (ctxt=<optimized out>) at arch/x86/power/cpu.c:227 #51 restore_processor_state () at arch/x86/power/cpu.c:233 ---Type <return> to continue, or q <return> to quit--- #52 0xffffffff8105a1b6 in create_image (platform_mode=0) at kernel/power/hibernate.c:291 #53 hibernation_snapshot (platform_mode=0) at kernel/power/hibernate.c:363 #54 0xffffffff8105a825 in hibernate () at kernel/power/hibernate.c:629 #55 0xffffffff81058868 in state_store (kobj=<optimized out>, attr=<optimized out>, buf=0xffff88001366e000 <Address 0xffff88001366e000 out of bounds>, n=5) at kernel/power/main.c:284 #56 0xffffffff811e6cc3 in kobj_attr_store (kobj=<optimized out>, attr=<optimized out>, buf=<optimized out>, count=<optimized out>) at lib/kobject.c:699 #57 0xffffffff811372c2 in flush_write_buffer (count=<optimized out>, buffer=<optimized out>, dentry=<optimized out>) at fs/sysfs/file.c:202 #58 sysfs_write_file (file=<optimized out>, buf=0x7f9ce710d000 <Address 0x7f9ce710d000 out of bounds>, count=<optimized out>, ppos=0xffff880014985f58) at fs/sysfs/file.c:236 #59 0xffffffff810ebfbe in vfs_write (file=0xffff88001cb48000, buf=0x7f9ce710d000 <Address 0x7f9ce710d000 out of bounds>, count=<optimized out>, pos=0xffff880014985f58) at fs/read_write.c:435 #60 0xffffffff810ec175 in sys_write (fd=<optimized out>, buf=0x7f9ce710d000 <Address 0x7f9ce710d000 out of bounds>, count=<optimized out>) at fs/read_write.c:487 that shows an access to clock happens right before x86_platform.restore_sched_clock_state is called. Moving x86_platform.restore_sched_clock_state before mtrr_bp_restore solves issue. It isn't bugged on me for 20 save/restore cycles with this change, without this change it bugs on 2nd-3rd cycle. BTW Amit, your config doesn't have CONFIG_KVM_GUEST set, which causes primary cpu clock to be uninitialized too in case of SMP kernel. Commit ca3f10172eea9b95b moved it from kvmclock_init to kvm_guest_init but forgot to change Kconfig too and kvm_guest_init is called only when KVM_GUEST is selected. This commit created following implicit deps if SMP && KVM_CLOCK then select KVM_GUEST But I don't know if it's possible to express it in Kconfig. That rises a question: Do we need KVM_CLOCK option? Maybe better to use KVM_GUEST instead and remove KVM_CLOCK option? Or make KVM_CLOCK depend on KVM_GUEST... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html