On 02/24/2016 07:55 PM, Owen Hofmann wrote: >>>> not-really-well-defined hybrid? >>>> >>>> --Andy >>> >>> 1. What is not well defined? I fail to spot anything >>> specific in Owen's e-mail. >> >> If I start a guest and query kvm-clock, I get a nanosecond count. >> AFAIK it is, in fact, ill-defined or at least ill-documented what that >> nanosecond count means. > > To try to put the thoughts into specific questions: > - What is the value returned by KVM_GET_CLOCK? How should it be used? > - What is the value returned by a guest read of the kvm-clock > structure? (This is also Andy's question) > To me there are two possibilities for how to answer the second question: > 1) kvm-clock is better than the host TSC: it propagates updates to > frequency from the host (== CLOCK_MONOTONIC) > 2) kvm-clock is a paravirtual source of truth on the guest TSC: > whether it is stable and its approximate frequency. If the guest needs > to synchronize to an external source of time, it runs NTP. (== > CLOCK_MONOTONIC_RAW) > > To me, (1) sounds hard, (2) sounds easy, and its not clear how much > additional value (1) provides. The recent patches Paolo sent move > kvm-clock in the direction of (1), and it sounds like Andy and I might > have slightly different opinions as well. But mostly I would like some > clarity as to which is the stated goal for kvm-clock, and to have the > implementation pick only one of those options. > >>>>> Since we cannot change the past, having kvmclock synchronize with the >>>>> host TSC frequency is the only choice we can make.' > > I'm not sure I understand what previous decision locks kvm-clock into > the current path. Can you clarify? > > On Wed, Feb 24, 2016 at 11:38 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> On Wed, Feb 24, 2016 at 9:38 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: >>> On Wed, Feb 24, 2016 at 08:44:40AM -0800, Andy Lutomirski wrote: >>>> On Wed, Feb 24, 2016 at 6:14 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: >>>>> >>>>> >>>>> On 24/02/2016 03:31, Owen Hofmann wrote: >>>>>> Specifically, what underlying source of time should be exposed through >>>>>> kvm-clock and other paravirtual ABIs like the HyperV reference tsc >>>>>> page? Recently a couple of threads on kvm-list, along with attempts >>>>>> to produce reliable behavior from kvm-clock on our systems have >>>>>> highlighted a tension between the current implementation of kvm-clock >>>>>> and potentially diverging goals for paravirt time. Here are a few: >>>>>> >>>>>> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html >>>>>> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html >>>>>> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html >>>>>> >>>>>> This question is mostly in regards to kvm-clock in masterclock mode >>>>>> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to >>>>>> expose a source of time that is more 'true' than the underlying TSC? >>>>>> For example, by passing through NTP correction from the host. For the >>>>>> current implementation, the answer seems to be... why not both? Once >>>>>> programmed, kvm-clock or the HyperV TSC page will advance with the TSC >>>>>> multiplied by the frequency specified by kvm. On the other hand, >>>>>> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR >>>>>> are measured against corrected time from the host. A guest reading its >>>>>> pvclock gets a very different result from a host KVM_GET_CLOCK if the >>>>>> guest has run long enough to for TSC to diverge from NTP time. >>>>> >>>>> Right, in fact that's why QEMU is not really using KVM_GET_CLOCK >>>>> anymore. In retrospect, the "fix" in QEMU was probably a bad idea. It >>>>> would have been better to fix KVM_GET_CLOCK. >>>>> >>>>>> To me, kvm-clock and the HyperV TSC page are extremely effective as >>>>>> simply a more enlightened path to the host TSC. Maintaining a >>>>>> high-performance path to the TSC in the face of updates is tricky - >>>>>> see the extended comment in pvclock_update_vm_gtod_copy, or the >>>>>> discussion on the patchset in (2). Is the cost of auditing that the >>>>>> path from host gettimeofday update -> kvm -> guest pvclock -> guest >>>>>> gettimeofday both tracks host time correctly and does not produce any >>>>>> backwards warps worth the added value, if it exists? As an >>>>>> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a >>>>>> function of the last update to kvm-clock or the reference TSC page, >>>>>> respectively, sounds very straightforward. >>>>> >>>>> Yes, we could do that too. >>>>> >>>>> I think that vgettsc and do_monotonic_boot also would have to use the >>>>> TSC frequency instead the NTP-adjusted host clock. >>>>> >>>>>> (Outside of masterclock mode, the requirement that the client >>>>>> synchronizes across cpus for montonicity smoothes over a lot of >>>>>> complexity - periodically updating kvm-clock to the current time is >>>>>> simple and works.) >>>>>> >>>>>> Regardless of my opinion, I think that a clear statement of the design >>>>>> goals for kvm-clock (and kvm's implementation of the reference TSC >>>>>> page) would be valuable. >>>>> >>>>> Since we cannot change the past, having kvmclock synchronize with the >>>>> host TSC frequency is the only choice we can make. >>>>> >>>> >>>> Could we introduce a new kvm-clock or perhaps opt-in mode that: >>>> >>>> a) uses hypervisor-supplied IO pages and, >>>> >>>> b) synchronizes to host CLOCK_MONOTONIC instead of some bizarre >>>> non-suspend-resume-safe >>> >>> Please be accurate. It is suspend safe. >>> >> >> I'm being accurate enough, I think. Master clock mode is not suspend >> safe. When I suspend and resume my laptop, the master clock code >> determines that it messed up and disables itself. Unloading and >> reloading the kvm modules turns it back on until the next suspect. >> >> I *think* that the underlying issue is that kvm-clock's master clock >> tracks something ill-defined instead of exposing a well-defined host >> clock. If the master clock accurately exposed CLOCK_MONOTONIC_RAW or >> CLOCK_MONOTONIC (I much prefer the latter), then it would be fine >> across suspend/resume. >> >> I think that part of the reason that it doesn't accurately export a >> host clock is that the worst-case performance of atomic updates to the >> pvclock data structures is abysmal due to having the data structures >> living in guest memory. To be able to access and update all relevant >> structures during host clock refreshes, the host would need to pin the >> all pvclock pages for all running guests. This could be partially >> mitigated by only updating pvclock data for running vcpus and for vcpu >> 0 for all running guests synchronously and deferring the rest (8k >> pinned per host cpu, max), but it would still be a mess. >> >> If someone redefined the interface so that the *host* could allocate >> it, then the pages could be shared across all guests and this would be >> vastly simpler and faster. >> >> Also, kvm-clock should really coordinate with the core timekeeping >> code to handle this sort of time base export rather than hooking into >> the host vdso support code. >> >>>> not-really-well-defined hybrid? >>>> >>>> --Andy >>> >>> 1. What is not well defined? I fail to spot anything >>> specific in Owen's e-mail. >> >> If I start a guest and query kvm-clock, I get a nanosecond count. >> AFAIK it is, in fact, ill-defined or at least ill-documented what that >> nanosecond count means. >> >> [cc: Joao. Xen may want to take this stuff into consideration.] [CC-ing xen-devel folks too] Joao >> --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html