On Wed, Feb 24, 2016 at 9:38 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > On Wed, Feb 24, 2016 at 08:44:40AM -0800, Andy Lutomirski wrote: >> On Wed, Feb 24, 2016 at 6:14 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: >> > >> > >> > On 24/02/2016 03:31, Owen Hofmann wrote: >> >> Specifically, what underlying source of time should be exposed through >> >> kvm-clock and other paravirtual ABIs like the HyperV reference tsc >> >> page? Recently a couple of threads on kvm-list, along with attempts >> >> to produce reliable behavior from kvm-clock on our systems have >> >> highlighted a tension between the current implementation of kvm-clock >> >> and potentially diverging goals for paravirt time. Here are a few: >> >> >> >> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html >> >> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html >> >> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html >> >> >> >> This question is mostly in regards to kvm-clock in masterclock mode >> >> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to >> >> expose a source of time that is more 'true' than the underlying TSC? >> >> For example, by passing through NTP correction from the host. For the >> >> current implementation, the answer seems to be... why not both? Once >> >> programmed, kvm-clock or the HyperV TSC page will advance with the TSC >> >> multiplied by the frequency specified by kvm. On the other hand, >> >> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR >> >> are measured against corrected time from the host. A guest reading its >> >> pvclock gets a very different result from a host KVM_GET_CLOCK if the >> >> guest has run long enough to for TSC to diverge from NTP time. >> > >> > Right, in fact that's why QEMU is not really using KVM_GET_CLOCK >> > anymore. In retrospect, the "fix" in QEMU was probably a bad idea. It >> > would have been better to fix KVM_GET_CLOCK. >> > >> >> To me, kvm-clock and the HyperV TSC page are extremely effective as >> >> simply a more enlightened path to the host TSC. Maintaining a >> >> high-performance path to the TSC in the face of updates is tricky - >> >> see the extended comment in pvclock_update_vm_gtod_copy, or the >> >> discussion on the patchset in (2). Is the cost of auditing that the >> >> path from host gettimeofday update -> kvm -> guest pvclock -> guest >> >> gettimeofday both tracks host time correctly and does not produce any >> >> backwards warps worth the added value, if it exists? As an >> >> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a >> >> function of the last update to kvm-clock or the reference TSC page, >> >> respectively, sounds very straightforward. >> > >> > Yes, we could do that too. >> > >> > I think that vgettsc and do_monotonic_boot also would have to use the >> > TSC frequency instead the NTP-adjusted host clock. >> > >> >> (Outside of masterclock mode, the requirement that the client >> >> synchronizes across cpus for montonicity smoothes over a lot of >> >> complexity - periodically updating kvm-clock to the current time is >> >> simple and works.) >> >> >> >> Regardless of my opinion, I think that a clear statement of the design >> >> goals for kvm-clock (and kvm's implementation of the reference TSC >> >> page) would be valuable. >> > >> > Since we cannot change the past, having kvmclock synchronize with the >> > host TSC frequency is the only choice we can make. >> > >> >> Could we introduce a new kvm-clock or perhaps opt-in mode that: >> >> a) uses hypervisor-supplied IO pages and, >> >> b) synchronizes to host CLOCK_MONOTONIC instead of some bizarre >> non-suspend-resume-safe > > Please be accurate. It is suspend safe. > I'm being accurate enough, I think. Master clock mode is not suspend safe. When I suspend and resume my laptop, the master clock code determines that it messed up and disables itself. Unloading and reloading the kvm modules turns it back on until the next suspect. I *think* that the underlying issue is that kvm-clock's master clock tracks something ill-defined instead of exposing a well-defined host clock. If the master clock accurately exposed CLOCK_MONOTONIC_RAW or CLOCK_MONOTONIC (I much prefer the latter), then it would be fine across suspend/resume. I think that part of the reason that it doesn't accurately export a host clock is that the worst-case performance of atomic updates to the pvclock data structures is abysmal due to having the data structures living in guest memory. To be able to access and update all relevant structures during host clock refreshes, the host would need to pin the all pvclock pages for all running guests. This could be partially mitigated by only updating pvclock data for running vcpus and for vcpu 0 for all running guests synchronously and deferring the rest (8k pinned per host cpu, max), but it would still be a mess. If someone redefined the interface so that the *host* could allocate it, then the pages could be shared across all guests and this would be vastly simpler and faster. Also, kvm-clock should really coordinate with the core timekeeping code to handle this sort of time base export rather than hooking into the host vdso support code. >> not-really-well-defined hybrid? >> >> --Andy > > 1. What is not well defined? I fail to spot anything > specific in Owen's e-mail. If I start a guest and query kvm-clock, I get a nanosecond count. AFAIK it is, in fact, ill-defined or at least ill-documented what that nanosecond count means. [cc: Joao. Xen may want to take this stuff into consideration.] --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html