Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Mon, 16 Oct 2023 17:25:01 +0100

On Mon, 2023-10-16 at 08:47 -0700, Dongli Zhang wrote:
> Hi David and Sean,
> 
> On 10/14/23 02:49, David Woodhouse wrote:
> > 
> > 
> > On 14 October 2023 00:26:45 BST, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > > 2. Suppose the KVM host has been running for long time, and the drift between
> > > > two domains would be accumulated to super large? (Even it may not introduce
> > > > anything bad immediately)
> > > 
> > > That already happens today, e.g. unless the host does vCPU hotplug or is using
> > > XEN's shared info page, masterclock updates effectively never happen.  And I'm
> > > not aware of a single bug report of someone complaining that kvmclock has drifted
> > > from the host clock.  The only bug reports we have are when KVM triggers an update
> > > and causes time to jump from the guest's perspective.
> > 
> > I've got reports about the Xen clock going backwards, and also
> > about it drifting over time w.r.t. the guest's TSC clocksource so
> > the watchdog in the guest declares its TSC clocksource unstable. 
> 
> I assume you meant Xen on KVM (not Xen guest on Xen hypervisor). According to my
> brief review of xen hypervisor code, it looks using the same algorithm to
> calculate the clock at hypervisor side, as in the xen guest.

Right. It's *exactly* the same thing. Even the same pvclock ABI in the
way it's exposed to the guest (in the KVM case via the MSR, in the Xen
case it's in the vcpu_info or a separate vcpu_time_info set up by Xen
hypercalls).

> Fortunately, the "tsc=reliable" my disable the watchdog, but I have no idea if
> it impacts Xen on KVM.

Right. I think Linux as a KVM guest automatically disables the
watchdog, or at least refuses to use the KVM clock as the watchdog for
the TSC clocksource?

Xen guests, on the other hand, aren't used to the Xen clock being as
unreliable as the KVM clock is, so they *do* use it as a watchdog for
the TSC clocksource.

> > I don't understand *why* we update the master lock when we populate
> > the Xen shared info. Or add a vCPU, for that matter.

Still don't...

> > > > The idea is to never update master clock, if tsc is stable (and masterclock is
> > > > already used).
> > > 
> > > That's another option, but if there are no masterclock updates, then it suffers
> > > the exact same (theoretical) problem as #2.  And there are real downsides, e.g.
> > > defining when KVM would synchronize kvmclock with the host clock would be
> > > significantly harder...
> > 
> > I thought the definition of such an approach would be that we
> > *never* resync the kvmclock to anything. It's based purely on the
> > TSC value when the guest started, and the TSC frequency. The
> > pvclock we advertise to all vCPUs would be the same, and would
> > *never* change except on migration.
> > 
> > (I guess that for consistency we would scale first to the *guest*
> > TSC and from that to nanoseconds.)
> > 
> > If userspace does anything which makes that become invalid,
> > userspace gets to keep both pieces. That includes userspace having
> > to deal with host suspend like migration, etc.
> 
> Suppose we are discussing a non-permanenet solution, I would suggest:
> 
> 1. Document something to accept that kvm-clock (or pvclock on KVM, including Xen
> on KVM) is not good enough in some cases, e.g., vCPU hotplug.

I still don't understand the vCPU hotplug case.

In the case where the TSC is actually sane, why would we need to reset
the masterclock on vCPU hotplug? 

The new vCPU gets its TSC synchronised to the others, and its kvmclock
parameters (mul/shift/offset based on the guest TSC) can be *precisely*
the same as the other vCPUs too, can't they? Why reset anything?

> 2. Do not reply on any userspace change, so that the solution can be easier to
> apply to existing environments running old KVM versions.
> 
> That is, to limit the change within KVM.
> 
> 3. The options would be to (1) stop updating masterclock in the ideal scenario
> (e.g., stable tsc), or to (2) refresh periodically to minimize the drift.

If the host TSC is sane, just *never* update the KVM masterclock. It
"drifts" w.r.t. the host CLOCK_MONOTONIC_RAW and nobody will ever care.

The only opt-in we need from userspace for that is to promise that the
host TSC will never get mangled, isn't it?

(We probably want to be able to export the pvclock information to
userspace (in terms of the mul/shift/offset from host TSC to guest TSC
and then the mul/shift/offset to kvmclock). Userspace may want to make
things like the PIT/HPET/PMtimer run on that clock.)

Attachment:
smime.p7s

Description: S/MIME cryptographic signature