On Wed, Feb 24, 2016 at 05:19:38PM -0800, Andy Lutomirski wrote: > On Wed, Feb 24, 2016 at 3:35 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > On Wed, Feb 24, 2016 at 09:35:44AM -0800, Peter Hornyack wrote: > >> On Tue, Feb 23, 2016 at 7:57 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > >> > On Tue, Feb 23, 2016 at 06:31:59PM -0800, Owen Hofmann wrote: > >> >> Specifically, what underlying source of time should be exposed through > >> >> kvm-clock and other paravirtual ABIs like the HyperV reference tsc > >> >> page? Recently a couple of threads on kvm-list, along with attempts > >> >> to produce reliable behavior from kvm-clock on our systems have > >> >> highlighted a tension between the current implementation of kvm-clock > >> >> and potentially diverging goals for paravirt time. Here are a few: > >> >> > >> >> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html > >> >> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html > >> >> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html > >> >> > >> >> This question is mostly in regards to kvm-clock in masterclock mode > >> >> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to > >> >> expose a source of time that is more 'true' than the underlying TSC? > >> >> For example, by passing through NTP correction from the host. For the > >> >> current implementation, the answer seems to be... why not both? Once > >> >> programmed, kvm-clock or the HyperV TSC page will advance with the TSC > >> >> multiplied by the frequency specified by kvm. On the other hand, > >> >> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR > >> >> are measured against corrected time from the host. A guest reading its > >> >> pvclock gets a very different result from a host KVM_GET_CLOCK if the > >> >> guest has run long enough to for TSC to diverge from NTP time. A VMM > >> >> using these ioctls to save and restore clock state can produce wild > >> >> time jumps from the guest's perspective. > >> >> > >> >> The patches in (2) address this mismatch by plumbing updates to clock > >> >> frequency through kvm-clock to the guest. This seems like an important > >> >> design choice for kvm-clock, and IMO deserves at least a clear > >> >> statement of the goals for this interface, if not some more > >> >> discussion. > >> > > >> > Design goals of what interface? KVM_GET_CLOCK / KVM_SET_CLOCK? > >> > > >> > The interfaces have been introduced to fix a bug. > >> > > >> >> The (later) thread in (3) claims that synchronizing with > >> >> host time is *not* a goal of kvm-clock. > >> > > >> > It is not. > >> > > >> >> To me, kvm-clock and the HyperV TSC page are extremely effective as > >> >> simply a more enlightened path to the host TSC. Maintaining a > >> >> high-performance path to the TSC in the face of updates is tricky - > >> >> see the extended comment in pvclock_update_vm_gtod_copy, or the > >> >> discussion on the patchset in (2). Is the cost of auditing that the > >> >> path from host gettimeofday update -> kvm -> guest pvclock -> guest > >> >> gettimeofday both tracks host time correctly and does not produce any > >> >> backwards warps worth the added value, if it exists? As an > >> >> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a > >> >> function of the last update to kvm-clock or the reference TSC page, > >> >> respectively, sounds very straightforward. > >> >> > >> >> (Outside of masterclock mode, the requirement that the client > >> >> synchronizes across cpus for montonicity smoothes over a lot of > >> >> complexity - periodically updating kvm-clock to the current time is > >> >> simple and works.) > >> >> > >> >> Regardless of my opinion, I think that a clear statement of the design > >> >> goals for kvm-clock (and kvm's implementation of the reference TSC > >> >> page) would be valuable. > >> > > >> > Documentation/virtual/kvm/timekeeping.txt > >> > > >> > >> Hi Marcelo, > >> > >> While I appreciate all of the detail in timekeeping.txt, it is not a > >> very good reference for what kvm-clock is or how it works. kvm-clock > >> is only mentioned three times in different places throughout that > >> document, and nowhere is there a very clear statement of what > >> kvm-clock is supposed to do or how it does it. > >> > >> For somebody that does not already have a deep understanding of the > >> core masterclock code, trying to understand how kvm-clock works is a > > > > There is no "deep understanding". There is one comment there about > > why you can't update systemtimestamp + tsc_offset (you have to read > > the kvmclock clock read function to understand this sentence) in > > parallel in multiple VCPUs, and thats all masterclock is about. > > > > Its called "master" because there must be only one system_timestamp > > and not multiple (therefore thats the "master" copy of system_time). > > > >> real challenge. > >> > >> Thanks, > >> Peter > > > > Design goals: provide a reliable clocksource device to Linux guests > > so they are able to cope with virtualization problems, namely: > > > > 1. Migration to hosts with different TSC frequency. > > 2. Support for hosts with TSCs that are not stable (whose > > counting frequency changes across processor frequency changes). > > > > How: Expose a clockdevice which counts at 1GHz to guests. > > This still doesn't define how closely it is intended to track 1 GHz or > whether NTP slew is applied. > > > Evolution of masterclock scheme (bugs uncovered): > > > > Problem: time backwards as seen by guests. > > Solution: Fix in guest with pvclock global variable (cmpxchg). > > I thought that was only for non-masterclock. > > > > > Problem: gettimeofday() performance > > Solution: Use masterclock scheme (update pvclock areas in sync to avoid > > time backwards event being visible to guests, its well documented in > > x86.c, if something is unclear please try to understand the code / ask > > and you/we improve the documentation there). > > The actual masterclock host code is long and very difficult to follow. > > In 4.5-rc, the vDSO guest code is IMO short and reasonably clear. > > > > > Problem: get_kernel_ns VS TSC clock get out of sync and > > Hyper-V complains about the difference. > > > > Solution: expose the NTP TSC frequency so that guests > > apply NTP frequency correction to their kvmclock reads on TSC as well. > > > > I don't understand what you mean. > > > --- > > > > About future: agree with Andy that kvmclock should be removed. > > So there is a pending work item there: "verify TSC clocksource > > is fine for exposing to guests, think about the implications for > > management software". > > I can write down a list of items that have been fixed > > for kvmclock and would have to be check for tsc clocksource. > > > > Anyone willing to take that task ? > > > > How? > > On very very new hosts (those that support TSC_ADJUST and tsc > scaling), this should be possible. Exactly, TSC scaling. > The host would ideally tell the > guest what frequency of clock it intends to provide (ideally 1 GHz > exactly) and the guest would use it. I'm not sure this hardware > exists yet. > > If you enable TSC scaling like this, you may need to supply an ART > (always running timer) adjustment to the guest in case you intend to > pass any ART consumers through to the guest. Of course, no one > outside Intel has *that* hardware either (AFAIK -- maybe there are > some prototypes floating around). > > > --- > > > > About complaint that "its not well designed whether NTP correction > > should be applied or not". There are two different things: > > > > 1) Host clock and guest clocks synchronized. > > KVM is not responsible for that, and it can't, because > > Linux exposes a clock which is created in software > > and fixed by NTP. > > I don't understand what you mean. > > Of course the guest can run its own NTP daemon or similar adjtimex > caller and cause the guest to stop tracking the host. But if the host > passed CLOCK_MONOTONIC through, then the guest would, by default, > treat kvm-clock as an exactly 1GHz source and would then expose a > disciplined NTP-tracking CLOCK_MONOTONIC through to its user apps even > without an NTP client on the guest. > > If integration with the POSIX clock core were provided, the guest > would learn to consume the host's CLOCK_REALTIME as well, as long as > the host uses the tsc as its clocksource. > > > > > 2) NTP frequency correction being applied to kvmclock. > > > > This only means that the frequency of the pvclock reads > > in the guest are NTP corrected. > > If the host applied NTP frequency correction to the guest, then I > would be happy. Some folks might want this to be optional. > > The guest can do additional correction on top if it wants regardless. > > --Andy Paolo's track-TSC-offset-multiplier-from-kvmclock-updates should make enabling masterclock for suspend/resume much simpler. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html