Re: What time is it kvm-clock?

Owen Hofmann <osh@xxxxxxxxxx> · Wed, 24 Feb 2016 19:50:08 -0800

On Wed, Feb 24, 2016 at 5:19 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Wed, Feb 24, 2016 at 3:35 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
>> On Wed, Feb 24, 2016 at 09:35:44AM -0800, Peter Hornyack wrote:
>>> On Tue, Feb 23, 2016 at 7:57 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
>>> > On Tue, Feb 23, 2016 at 06:31:59PM -0800, Owen Hofmann wrote:
>>> >> Specifically, what underlying source of time should be exposed through
>>> >> kvm-clock and other paravirtual ABIs like the HyperV reference tsc
>>> >> page?  Recently a couple of threads on kvm-list, along with attempts
>>> >> to produce reliable behavior from kvm-clock on our systems have
>>> >> highlighted a tension between the current implementation of kvm-clock
>>> >> and potentially diverging goals for paravirt time. Here are a few:
>>> >>
>>> >> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html
>>> >> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html
>>> >> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html
>>> >>
>>> >> This question is mostly in regards to kvm-clock in masterclock mode
>>> >> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to
>>> >> expose a source of time that is more 'true' than the underlying TSC?
>>> >> For example, by passing through NTP correction from the host. For the
>>> >> current implementation, the answer seems to be... why not both? Once
>>> >> programmed, kvm-clock or the HyperV TSC page will advance with the TSC
>>> >> multiplied by the frequency specified by kvm. On the other hand,
>>> >> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR
>>> >> are measured against corrected time from the host. A guest reading its
>>> >> pvclock gets a very different result from a host KVM_GET_CLOCK if the
>>> >> guest has run long enough to for TSC to diverge from NTP time. A VMM
>>> >> using these ioctls to save and restore clock state can produce wild
>>> >> time jumps from the guest's perspective.
>>> >>
>>> >> The patches in (2) address this mismatch by plumbing updates to clock
>>> >> frequency through kvm-clock to the guest. This seems like an important
>>> >> design choice for kvm-clock, and IMO deserves at least a clear
>>> >> statement of the goals for this interface, if not some more
>>> >> discussion.
>>> >
>>> > Design goals of what interface? KVM_GET_CLOCK / KVM_SET_CLOCK?
>>> >
>>> > The interfaces have been introduced to fix a bug.
>>> >
>>> >> The (later) thread in (3) claims that synchronizing with
>>> >> host time is *not* a goal of kvm-clock.
>>> >
>>> > It is not.
>>> >
>>> >> To me, kvm-clock and the HyperV TSC page are extremely effective as
>>> >> simply a more enlightened path to the host TSC. Maintaining a
>>> >> high-performance path to the TSC in the face of updates is tricky -
>>> >> see the extended comment in pvclock_update_vm_gtod_copy, or the
>>> >> discussion on the patchset in (2). Is the cost of auditing that the
>>> >> path from host gettimeofday update -> kvm -> guest pvclock -> guest
>>> >> gettimeofday both tracks host time correctly and does not produce any
>>> >> backwards warps worth the added value, if it exists? As an
>>> >> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a
>>> >> function of the last update to kvm-clock or the reference TSC page,
>>> >> respectively, sounds very straightforward.
>>> >>
>>> >> (Outside of masterclock mode, the requirement that the client
>>> >> synchronizes across cpus for montonicity smoothes over a lot of
>>> >> complexity - periodically updating kvm-clock to the current time is
>>> >> simple and works.)
>>> >>
>>> >> Regardless of my opinion, I think that a clear statement of the design
>>> >> goals for kvm-clock (and kvm's implementation of the reference TSC
>>> >> page) would be valuable.
>>> >
>>> > Documentation/virtual/kvm/timekeeping.txt
>>> >
>>>
>>> Hi Marcelo,
>>>
>>> While I appreciate all of the detail in timekeeping.txt, it is not a
>>> very good reference for what kvm-clock is or how it works. kvm-clock
>>> is only mentioned three times in different places throughout that
>>> document, and nowhere is there a very clear statement of what
>>> kvm-clock is supposed to do or how it does it.
>>>
>>> For somebody that does not already have a deep understanding of the
>>> core masterclock code, trying to understand how kvm-clock works is a
>>
>> There is no "deep understanding". There is one comment there about
>> why you can't update systemtimestamp + tsc_offset (you have to read
>> the kvmclock clock read function to understand this sentence) in
>> parallel in multiple VCPUs, and thats all masterclock is about.
>>
>> Its called "master" because there must be only one system_timestamp
>> and not multiple (therefore thats the "master" copy of system_time).
>>
>>> real challenge.
>>>
>>> Thanks,
>>> Peter
>>
>> Design goals: provide a reliable clocksource device to Linux guests
>> so they are able to cope with virtualization problems, namely:
>>
>> 1. Migration to hosts with different TSC frequency.
>> 2. Support for hosts with TSCs that are not stable (whose
>> counting frequency changes across processor frequency changes).
>>
>> How: Expose a clockdevice which counts at 1GHz to guests.
>
> This still doesn't define how closely it is intended to track 1 GHz or
> whether NTP slew is applied.
>
>> Evolution of masterclock scheme (bugs uncovered):
>>
>> Problem: time backwards as seen by guests.
>> Solution: Fix in guest with pvclock global variable (cmpxchg).
>
> I thought that was only for non-masterclock.
>
>>
>> Problem: gettimeofday() performance
>> Solution: Use masterclock scheme (update pvclock areas in sync to avoid
>> time backwards event being visible to guests, its well documented in
>> x86.c, if something is unclear please try to understand the code / ask
>> and you/we improve the documentation there).
>
> The actual masterclock host code is long and very difficult to follow.
>
> In 4.5-rc, the vDSO guest code is IMO short and reasonably clear.
>
>>
>> Problem: get_kernel_ns VS TSC clock get out of sync and
>> Hyper-V complains about the difference.
>>
>> Solution: expose the NTP TSC frequency so that guests
>> apply NTP frequency correction to their kvmclock reads on TSC as well.
>>
>
> I don't understand what you mean.
>
>> ---
>>
>> About future: agree with Andy that kvmclock should be removed.
>> So there is a pending work item there: "verify TSC clocksource
>> is fine for exposing to guests, think about the implications for
>> management software".
>> I can write down a list of items that have been fixed
>> for kvmclock and would have to be check for tsc clocksource.
>>
>> Anyone willing to take that task ?
>>

This would be a wonderful goal. But I think that you would want some
extra bits besides just "remove kvmclock":
- Force the guest to consider TSC a high quality clocksource.
- Provide the host's calibrated TSC frequency to the guest.
- Provide an alternative to hardware frequency scaling
These sound to me like the requirements for pvclock/kvmclock.

>
> How?
>
> On very very new hosts (those that support TSC_ADJUST and tsc
> scaling), this should be possible.  The host would ideally tell the
> guest what frequency of clock it intends to provide (ideally 1 GHz
> exactly) and the guest would use it.  I'm not sure this hardware
> exists yet.
>
> If you enable TSC scaling like this, you may need to supply an ART
> (always running timer) adjustment to the guest in case you intend to
> pass any ART consumers through to the guest.  Of course, no one
> outside Intel has *that* hardware either (AFAIK -- maybe there are
> some prototypes floating around).
>
>> ---
>>
>> About complaint that "its not well designed whether NTP correction
>> should be applied or not". There are two different things:
>>
>> 1) Host clock and guest clocks synchronized.
>> KVM is not responsible for that, and it can't, because
>> Linux exposes a clock which is created in software
>> and fixed by NTP.
>
> I don't understand what you mean.
>
> Of course the guest can run its own NTP daemon or similar adjtimex
> caller and cause the guest to stop tracking the host.  But if the host
> passed CLOCK_MONOTONIC through, then the guest would, by default,
> treat kvm-clock as an exactly 1GHz source and would then expose a
> disciplined NTP-tracking CLOCK_MONOTONIC through to its user apps even
> without an NTP client on the guest.
>
> If integration with the POSIX clock core were provided, the guest
> would learn to consume the host's CLOCK_REALTIME as well, as long as
> the host uses the tsc as its clocksource.

Your proposal, which I'd describe as a direct passthrough (to the
extent possible) of the host gettimeofday vdso to a kvm guest, sounds
like a much better way to get clock frequency adjustments from the
host to the guest. But I don't know if I can think of a reason to do
this besides "hey you don't have to run ntp". Is there a situation you
have in mind that this helps out?

>
>>
>> 2) NTP frequency correction being applied to kvmclock.
>>
>> This only means that the frequency of the pvclock reads
>> in the guest are NTP corrected.
>
> If the host applied NTP frequency correction to the guest, then I
> would be happy.  Some folks might want this to be optional.
>
> The guest can do additional correction on top if it wants regardless.
>
> --Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html