On Thu, Dec 10, 2015 at 1:32 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote: >> I'm trying to clean up kvmclock and I can't get it to work at all. My >> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC. >> >> If I boot an SMP (2 vcpus) guest, tracing says: >> >> qemu-system-x86-2517 [001] 102242.610654: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 0 >> qemu-system-x86-2521 [000] 102242.613742: kvm_track_tsc: >> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc >> qemu-system-x86-2522 [000] 102242.622959: kvm_track_tsc: >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2521 [000] 102242.645123: kvm_track_tsc: >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2522 [000] 102242.647291: kvm_track_tsc: >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2521 [000] 102242.653369: kvm_track_tsc: >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2522 [000] 102242.653429: kvm_track_tsc: >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc >> qemu-system-x86-2517 [001] 102242.653447: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> qemu-system-x86-2521 [000] 102242.653657: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> qemu-system-x86-2522 [002] 102242.664448: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> >> >> If I boot a UP guest, tracing says: >> >> qemu-system-x86-2567 [001] 102370.447484: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> qemu-system-x86-2571 [002] 102370.447688: kvm_update_master_clock: >> masterclock 0 hostclock tsc offsetmatched 1 >> >> I suspect, but I haven't verified, that this is fallout from: >> >> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f >> Author: Marcelo Tosatti <mtosatti@xxxxxxxxxx> >> Date: Wed May 14 12:43:24 2014 -0300 >> >> KVM: x86: disable master clock if TSC is reset during suspend >> >> Updating system_time from the kernel clock once master clock >> has been enabled can result in time backwards event, in case >> kernel clock frequency is lower than TSC frequency. >> >> Disable master clock in case it is necessary to update it >> from the resume path. >> >> Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> >> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> >> >> >> Can we please stop making kvmclock more complex? It's a beast right >> now, and not in a good way. It's far too tangled with the vclock >> machinery on both the host and guest sides, the pvclock stuff is not >> well thought out (even in principle in an ABI sense), and it's never >> been clear to my what problem exactly the kvmclock stuff is supposed >> to solve. >> >> I'm somewhat tempted to suggest that we delete kvmclock entirely and >> start over. A correctly functioning KVM guest using TSC (i.e. >> ignoring kvmclock entirely) >> seems to work rather more reliably and >> considerably faster than a kvmclock guest. >> >> --Andy >> >> -- >> Andy Lutomirski >> AMA Capital Management, LLC > > Andy, > > I am all for solving practical problems rather than pleasing aesthetic > pleasure. > >> Updating system_time from the kernel clock once master clock >> has been enabled can result in time backwards event, in case >> kernel clock frequency is lower than TSC frequency. >> >> Disable master clock in case it is necessary to update it >> from the resume path. > >> once master clock >> has been enabled can result in time backwards event, in case >> kernel clock frequency is lower than TSC frequency. > > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads. > > If the effective frequency of the kernel clock is lower (for example > due to NTP correcting the TSC frequency of the system), and you resume > and update the system, the following happens: > > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads=LARGE VALUE. > suspend/resume event. > guest visible clock = tsc_timestamp (updated at time N) + scaled tsc reads=0. > I'm still not seeing the issue. The formula is: (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >> pvti->tsc_shift) + pvti->system_time Obviously, if you reset pvti->tsc_timestamp to the current tsc value after suspend/resume, you would also need to update system_time. I don't see what this has to do with suspend/resume or with whether the effective scale factor is greater than or less than one. The only suspend/resume interaction I can see is that, if the host allows the guest-observed TSC value to jump (which is arguably a bug, what that's not important here), it needs to update pvti before resuming the guest. Can you clarify concretely what goes wrong here? (I'm also at a bit of a loss as to why this needs both system_time and tsc_timestamp. They're redundant in the sense that you could set tsc_timestamp to zero and subtract (tsc_timestamp * tsc_to_system_mul) >> tsc_shift to system_time without changing the result of the calculation.) --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html