On Mon, Jan 07, 2019 at 20:04:41 -0500, Pavel Tatashin wrote: > I did exactly the same sequence on Kaby Lake CPU and could not > reproduce it. What is your host CPU? > I have some machines which display this bug and others that don't, so I was able to figure out the difference between their configurations. TL;DR the bug appears to be based on wther or not /sys/devices/system/clocksource/clocksource0/current_clocksource is set to TSC in the hypervisor This is the log from a machine with the bug: [ 0.000000] Hypervisor detected: KVM [ 0.000000] kvm-clock: Using msrs 12 and 11 [1162908.013830] kvm-clock: cpu 0, msr 3e0fea001, primary cpu clock [1162908.013830] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [1162908.013834] tsc: Detected 1899.888 MHz processor This is the log from a machine without the bug: [ 0.000000] Hypervisor detected: KVM [ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00 [ 0.000000] kvm-clock: cpu 0, msr 149fea001, primary cpu clock [ 0.000000] kvm-clock: using sched offset of 1558436482528906 cycles [ 0.000002] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.000004] tsc: Detected 2097.570 MHz processor Note the additional line of output on the machine without the bug: [ 0.000000] kvm-clock: using sched offset of 1558436482528906 cycles This is printed from kvm_sched_clock_init() in arch/x86/kernel/kvmclock.c based on whether or not the clock is stable. For the clock to be stable both KVM_FEATURE_CLOCKSOURCE_STABLE_BIT and PVCLOCK_TSC_STABLE_BIT have to be set. Both of these are controlled by the hypervisor kernel. * KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is always set by the hypervisor starting with Linux v2.6.35 - 371bcf646d17 ("KVM: x86: Tell the guest we'll warn it about tsc stability") * PVCLOCK_TSC_STABLE_BIT is set starting in Linux v3.8 but only if the clocksource is the TSC - d828199e8444 ("KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag") I changed the clocksource of a hypervisor that wasn't having issues from TSC to HPET and when I started up a guest VM the bug suddenly appeared. I shut down the guest, set the hypervisor's clocksource back to TSC, started up the guest and the bug went away again. You don't actually have to reboot the guest before the bug is visible either, just letting the guest sit at the GRUB menu for a minute or two before loading Linux is enough to make the bug plainly visible in the printk timestamps. I don't know enough to actually fix the bug, but hopefully this is enough to allow everyone else to reproduce it and come up with a fix. -- Jon X(7): A program for managing terminal windows. See also screen(1) and tmux(1).