Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

Dongli Zhang <dongli.zhang@xxxxxxxxxx> · Mon, 16 Oct 2023 08:47:15 -0700

Hi David and Sean,

On 10/14/23 02:49, David Woodhouse wrote:
> 
> 
> On 14 October 2023 00:26:45 BST, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>>> 2. Suppose the KVM host has been running for long time, and the drift between
>>> two domains would be accumulated to super large? (Even it may not introduce
>>> anything bad immediately)
>>
>> That already happens today, e.g. unless the host does vCPU hotplug or is using
>> XEN's shared info page, masterclock updates effectively never happen.  And I'm
>> not aware of a single bug report of someone complaining that kvmclock has drifted
>>from the host clock.  The only bug reports we have are when KVM triggers an update
>> and causes time to jump from the guest's perspective.
> 
> I've got reports about the Xen clock going backwards, and also about it drifting over time w.r.t. the guest's TSC clocksource so the watchdog in the guest declares its TSC clocksource unstable. 

I assume you meant Xen on KVM (not Xen guest on Xen hypervisor). According to my
brief review of xen hypervisor code, it looks using the same algorithm to
calculate the clock at hypervisor side, as in the xen guest.

Fortunately, the "tsc=reliable" my disable the watchdog, but I have no idea if
it impacts Xen on KVM.

> 
> I don't understand *why* we update the master lock when we populate the Xen shared info. Or add a vCPU, for that matter. 
> 
>>> The idea is to never update master clock, if tsc is stable (and masterclock is
>>> already used).
>>
>> That's another option, but if there are no masterclock updates, then it suffers
>> the exact same (theoretical) problem as #2.  And there are real downsides, e.g.
>> defining when KVM would synchronize kvmclock with the host clock would be
>> significantly harder...
> 
> I thought the definition of such an approach would be that we *never* resync the kvmclock to anything. It's based purely on the TSC value when the guest started, and the TSC frequency. The pvclock we advertise to all vCPUs would be the same, and would *never* change except on migration.
> 
> (I guess that for consistency we would scale first to the *guest* TSC and from that to nanoseconds.)
> 
> If userspace does anything which makes that become invalid, userspace gets to keep both pieces. That includes userspace having to deal with host suspend like migration, etc.

Suppose we are discussing a non-permanenet solution, I would suggest:

1. Document something to accept that kvm-clock (or pvclock on KVM, including Xen
on KVM) is not good enough in some cases, e.g., vCPU hotplug.

2. Do not reply on any userspace change, so that the solution can be easier to
apply to existing environments running old KVM versions.

That is, to limit the change within KVM.

3. The options would be to (1) stop updating masterclock in the ideal scenario
(e.g., stable tsc), or to (2) refresh periodically to minimize the drift.

Or there is better option ...

Thank you very much!

Dongli Zhang