Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Wed, 11 Oct 2023 08:18:36 +0100

On Tue, 2023-10-10 at 17:20 -0700, Sean Christopherson wrote:
> On Wed, Oct 04, 2023, Dongli Zhang wrote:
> > > And because that's not enough, on pCPU migration or if the TSC is unstable,
> > > kvm_arch_vcpu_load() requests KVM_REQ_GLOBAL_CLOCK_UPDATE, which schedules
> > > kvmclock_update_fn() with a delay of 100ms.  The large delay is to play nice with
> > > unstable TSCs.  But if KVM is periodically doing clock updates on all vCPU,
> > > scheduling another update with a *longer* delay is silly.
> > 
> > We may need to add above message to the places, where
> > KVM_REQ_GLOBAL_CLOCK_UPDATE is replaced with KVM_REQ_CLOCK_UPDATE in the patch?
> 
> Yeah, comments are most definitely needed, this was just intended to be a quick
> sketch to get the ball rolling.
> 
> > > -static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
> > > -{
> > > -       struct kvm *kvm = v->kvm;
> > > -
> > > -       kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> > > -       schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> > > -                                       KVMCLOCK_UPDATE_DELAY);
> > > -}
> > > -
> > >  #define KVMCLOCK_SYNC_PERIOD (300 * HZ)
> > 
> > While David mentioned "maximum delta", how about to turn above into a module
> > param with the default 300HZ.
> > 
> > BTW, 300HZ should be enough for vCPU hotplug case, unless people prefer 1-hour
> > or 1-day.
> 
> Hmm, I think I agree with David that it would be better if KVM can take care of
> the gory details and promise a certain level of accuracy.  I'm usually a fan of
> punting complexity to userspace, but requiring every userspace to figure out the
> ideal sync frequency on every platform is more than a bit unfriendly.  And it
> might not even be realistic unless userspace makes assumptions about how the kernel
> computes CLOCK_MONOTONIC_RAW from TSC cycles.
> 

I think perhaps I would rather save up my persuasiveness on the topic
of "let's not make things too awful for userspace to cope with" for the
live update/migration mess. I think I need to dust off that attempt at
fixing our 'how to migrate with clocks intact' documentation from
https://lore.kernel.org/kvm/13f256ad95de186e3b6bcfcc1f88da5d0ad0cb71.camel@xxxxxxxxxxxxx/
The changes we're discussing here obviously have an effect on migration
too.

Where the host TSC is actually reliable, I would really prefer for the
kvmclock to just be a fixed function of the guest TSC and *not* to be
arbitrarily yanked back[1] to the host's CLOCK_MONOTONIC periodically.

That makes the process of migrating to another machine (potentially
with a different host TSC frequency) a whole lot simpler. Userspace
just needs to know two things from the kernel:

 • the relationship between the guest's TSC and its kvmclock (which we
   helpfully advertise to the *guest* in a pvclock structure, and can
   give that to userspace too). 

 • The value of *either* the guest's TSC or its kvmclock, at a given
   value of CLOCK_TAI (not CLOCK_WALLTIME). Theoretically, this can be
   either TSC or kvmclock. But I think for best precision it would need
   to be TSC?

I am aware that I glibly said "guest TSC" as if there's only one, or
they're in sync. I think we can substitute "the TSC of guest vCPU0" in
the case of migration. We don't want a guest to be able to change its
kvmclock by writing to vCPU0's TSC though, so we may need to keep (and
migrate) an additional offset value for tracking that?

[1] Yes, I believe "back" does happen. I have test failures in my queue
to look at, where guests see the "Xen" clock going backwards.

Attachment:
smime.p7s

Description: S/MIME cryptographic signature