On Wed, Feb 24, 2016 at 5:19 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Wed, Feb 24, 2016 at 3:35 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: >> On Wed, Feb 24, 2016 at 09:35:44AM -0800, Peter Hornyack wrote: >>> On Tue, Feb 23, 2016 at 7:57 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: >>> > On Tue, Feb 23, 2016 at 06:31:59PM -0800, Owen Hofmann wrote: >>> >> Specifically, what underlying source of time should be exposed through >>> >> kvm-clock and other paravirtual ABIs like the HyperV reference tsc >>> >> page? Recently a couple of threads on kvm-list, along with attempts >>> >> to produce reliable behavior from kvm-clock on our systems have >>> >> highlighted a tension between the current implementation of kvm-clock >>> >> and potentially diverging goals for paravirt time. Here are a few: >>> >> >>> >> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html >>> >> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html >>> >> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html >>> >> >>> >> This question is mostly in regards to kvm-clock in masterclock mode >>> >> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to >>> >> expose a source of time that is more 'true' than the underlying TSC? >>> >> For example, by passing through NTP correction from the host. For the >>> >> current implementation, the answer seems to be... why not both? Once >>> >> programmed, kvm-clock or the HyperV TSC page will advance with the TSC >>> >> multiplied by the frequency specified by kvm. On the other hand, >>> >> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR >>> >> are measured against corrected time from the host. A guest reading its >>> >> pvclock gets a very different result from a host KVM_GET_CLOCK if the >>> >> guest has run long enough to for TSC to diverge from NTP time. A VMM >>> >> using these ioctls to save and restore clock state can produce wild >>> >> time jumps from the guest's perspective. >>> >> >>> >> The patches in (2) address this mismatch by plumbing updates to clock >>> >> frequency through kvm-clock to the guest. This seems like an important >>> >> design choice for kvm-clock, and IMO deserves at least a clear >>> >> statement of the goals for this interface, if not some more >>> >> discussion. >>> > >>> > Design goals of what interface? KVM_GET_CLOCK / KVM_SET_CLOCK? >>> > >>> > The interfaces have been introduced to fix a bug. >>> > >>> >> The (later) thread in (3) claims that synchronizing with >>> >> host time is *not* a goal of kvm-clock. >>> > >>> > It is not. >>> > >>> >> To me, kvm-clock and the HyperV TSC page are extremely effective as >>> >> simply a more enlightened path to the host TSC. Maintaining a >>> >> high-performance path to the TSC in the face of updates is tricky - >>> >> see the extended comment in pvclock_update_vm_gtod_copy, or the >>> >> discussion on the patchset in (2). Is the cost of auditing that the >>> >> path from host gettimeofday update -> kvm -> guest pvclock -> guest >>> >> gettimeofday both tracks host time correctly and does not produce any >>> >> backwards warps worth the added value, if it exists? As an >>> >> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a >>> >> function of the last update to kvm-clock or the reference TSC page, >>> >> respectively, sounds very straightforward. >>> >> >>> >> (Outside of masterclock mode, the requirement that the client >>> >> synchronizes across cpus for montonicity smoothes over a lot of >>> >> complexity - periodically updating kvm-clock to the current time is >>> >> simple and works.) >>> >> >>> >> Regardless of my opinion, I think that a clear statement of the design >>> >> goals for kvm-clock (and kvm's implementation of the reference TSC >>> >> page) would be valuable. >>> > >>> > Documentation/virtual/kvm/timekeeping.txt >>> > >>> >>> Hi Marcelo, >>> >>> While I appreciate all of the detail in timekeeping.txt, it is not a >>> very good reference for what kvm-clock is or how it works. kvm-clock >>> is only mentioned three times in different places throughout that >>> document, and nowhere is there a very clear statement of what >>> kvm-clock is supposed to do or how it does it. >>> >>> For somebody that does not already have a deep understanding of the >>> core masterclock code, trying to understand how kvm-clock works is a >> >> There is no "deep understanding". There is one comment there about >> why you can't update systemtimestamp + tsc_offset (you have to read >> the kvmclock clock read function to understand this sentence) in >> parallel in multiple VCPUs, and thats all masterclock is about. >> >> Its called "master" because there must be only one system_timestamp >> and not multiple (therefore thats the "master" copy of system_time). >> >>> real challenge. >>> >>> Thanks, >>> Peter >> >> Design goals: provide a reliable clocksource device to Linux guests >> so they are able to cope with virtualization problems, namely: >> >> 1. Migration to hosts with different TSC frequency. >> 2. Support for hosts with TSCs that are not stable (whose >> counting frequency changes across processor frequency changes). >> >> How: Expose a clockdevice which counts at 1GHz to guests. > > This still doesn't define how closely it is intended to track 1 GHz or > whether NTP slew is applied. > >> Evolution of masterclock scheme (bugs uncovered): >> >> Problem: time backwards as seen by guests. >> Solution: Fix in guest with pvclock global variable (cmpxchg). > > I thought that was only for non-masterclock. > >> >> Problem: gettimeofday() performance >> Solution: Use masterclock scheme (update pvclock areas in sync to avoid >> time backwards event being visible to guests, its well documented in >> x86.c, if something is unclear please try to understand the code / ask >> and you/we improve the documentation there). > > The actual masterclock host code is long and very difficult to follow. > > In 4.5-rc, the vDSO guest code is IMO short and reasonably clear. > >> >> Problem: get_kernel_ns VS TSC clock get out of sync and >> Hyper-V complains about the difference. >> >> Solution: expose the NTP TSC frequency so that guests >> apply NTP frequency correction to their kvmclock reads on TSC as well. >> > > I don't understand what you mean. > >> --- >> >> About future: agree with Andy that kvmclock should be removed. >> So there is a pending work item there: "verify TSC clocksource >> is fine for exposing to guests, think about the implications for >> management software". >> I can write down a list of items that have been fixed >> for kvmclock and would have to be check for tsc clocksource. >> >> Anyone willing to take that task ? >> This would be a wonderful goal. But I think that you would want some extra bits besides just "remove kvmclock": - Force the guest to consider TSC a high quality clocksource. - Provide the host's calibrated TSC frequency to the guest. - Provide an alternative to hardware frequency scaling These sound to me like the requirements for pvclock/kvmclock. > > How? > > On very very new hosts (those that support TSC_ADJUST and tsc > scaling), this should be possible. The host would ideally tell the > guest what frequency of clock it intends to provide (ideally 1 GHz > exactly) and the guest would use it. I'm not sure this hardware > exists yet. > > If you enable TSC scaling like this, you may need to supply an ART > (always running timer) adjustment to the guest in case you intend to > pass any ART consumers through to the guest. Of course, no one > outside Intel has *that* hardware either (AFAIK -- maybe there are > some prototypes floating around). > >> --- >> >> About complaint that "its not well designed whether NTP correction >> should be applied or not". There are two different things: >> >> 1) Host clock and guest clocks synchronized. >> KVM is not responsible for that, and it can't, because >> Linux exposes a clock which is created in software >> and fixed by NTP. > > I don't understand what you mean. > > Of course the guest can run its own NTP daemon or similar adjtimex > caller and cause the guest to stop tracking the host. But if the host > passed CLOCK_MONOTONIC through, then the guest would, by default, > treat kvm-clock as an exactly 1GHz source and would then expose a > disciplined NTP-tracking CLOCK_MONOTONIC through to its user apps even > without an NTP client on the guest. > > If integration with the POSIX clock core were provided, the guest > would learn to consume the host's CLOCK_REALTIME as well, as long as > the host uses the tsc as its clocksource. Your proposal, which I'd describe as a direct passthrough (to the extent possible) of the host gettimeofday vdso to a kvm guest, sounds like a much better way to get clock frequency adjustments from the host to the guest. But I don't know if I can think of a reason to do this besides "hey you don't have to run ntp". Is there a situation you have in mind that this helps out? > >> >> 2) NTP frequency correction being applied to kvmclock. >> >> This only means that the frequency of the pvclock reads >> in the guest are NTP corrected. > > If the host applied NTP frequency correction to the guest, then I > would be happy. Some folks might want this to be optional. > > The guest can do additional correction on top if it wants regardless. > > --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html