On Tue, 2023-10-10 at 17:20 -0700, Sean Christopherson wrote: > On Wed, Oct 04, 2023, Dongli Zhang wrote: > > > And because that's not enough, on pCPU migration or if the TSC is unstable, > > > kvm_arch_vcpu_load() requests KVM_REQ_GLOBAL_CLOCK_UPDATE, which schedules > > > kvmclock_update_fn() with a delay of 100ms. The large delay is to play nice with > > > unstable TSCs. But if KVM is periodically doing clock updates on all vCPU, > > > scheduling another update with a *longer* delay is silly. > > > > We may need to add above message to the places, where > > KVM_REQ_GLOBAL_CLOCK_UPDATE is replaced with KVM_REQ_CLOCK_UPDATE in the patch? > > Yeah, comments are most definitely needed, this was just intended to be a quick > sketch to get the ball rolling. > > > > -static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) > > > -{ > > > - struct kvm *kvm = v->kvm; > > > - > > > - kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); > > > - schedule_delayed_work(&kvm->arch.kvmclock_update_work, > > > - KVMCLOCK_UPDATE_DELAY); > > > -} > > > - > > > #define KVMCLOCK_SYNC_PERIOD (300 * HZ) > > > > While David mentioned "maximum delta", how about to turn above into a module > > param with the default 300HZ. > > > > BTW, 300HZ should be enough for vCPU hotplug case, unless people prefer 1-hour > > or 1-day. > > Hmm, I think I agree with David that it would be better if KVM can take care of > the gory details and promise a certain level of accuracy. I'm usually a fan of > punting complexity to userspace, but requiring every userspace to figure out the > ideal sync frequency on every platform is more than a bit unfriendly. And it > might not even be realistic unless userspace makes assumptions about how the kernel > computes CLOCK_MONOTONIC_RAW from TSC cycles. > I think perhaps I would rather save up my persuasiveness on the topic of "let's not make things too awful for userspace to cope with" for the live update/migration mess. I think I need to dust off that attempt at fixing our 'how to migrate with clocks intact' documentation from https://lore.kernel.org/kvm/13f256ad95de186e3b6bcfcc1f88da5d0ad0cb71.camel@xxxxxxxxxxxxx/ The changes we're discussing here obviously have an effect on migration too. Where the host TSC is actually reliable, I would really prefer for the kvmclock to just be a fixed function of the guest TSC and *not* to be arbitrarily yanked back[1] to the host's CLOCK_MONOTONIC periodically. That makes the process of migrating to another machine (potentially with a different host TSC frequency) a whole lot simpler. Userspace just needs to know two things from the kernel: • the relationship between the guest's TSC and its kvmclock (which we helpfully advertise to the *guest* in a pvclock structure, and can give that to userspace too). • The value of *either* the guest's TSC or its kvmclock, at a given value of CLOCK_TAI (not CLOCK_WALLTIME). Theoretically, this can be either TSC or kvmclock. But I think for best precision it would need to be TSC? I am aware that I glibly said "guest TSC" as if there's only one, or they're in sync. I think we can substitute "the TSC of guest vCPU0" in the case of migration. We don't want a guest to be able to change its kvmclock by writing to vCPU0's TSC though, so we may need to keep (and migrate) an additional offset value for tracking that? [1] Yes, I believe "back" does happen. I have test failures in my queue to look at, where guests see the "Xen" clock going backwards.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature