On Tue, Sep 26, 2023, Dongli Zhang wrote: > Hi Joe, > > On 9/26/23 17:29, Joe Jin wrote: > > On 9/26/23 4:06 PM, Dongli Zhang wrote: > >> This is to minimize the kvmclock drift during CPU hotplug (or when the > >> master clock and pvclock_vcpu_time_info are updated). Updated by who? > >> Since kvmclock and raw monotonic (clocksource_tsc) use different > >> equation/mult/shift to convert the tsc to nanosecond, there may be clock > >> drift issue during CPU hotplug (when the master clock is updated). Based on #4, I assume you mean "vCPU hotplug from the host", but from this and the above it's not clear if this means "vCPU hotplug from the host", "pCPU hotplug in the host", or "CPU hotplug in the guest". > >> 1. The guest boots and all vcpus have the same 'pvclock_vcpu_time_info' > >> (suppose the master clock is used). > >> > >> 2. Since the master clock is never updated, the periodic kvmclock_sync_work > >> does not update the values in 'pvclock_vcpu_time_info'. > >> > >> 3. Suppose a very long period has passed (e.g., 30-day). > >> > >> 4. The user adds another vcpu. Both master clock and > >> 'pvclock_vcpu_time_info' are updated, based on the raw monotonic. So why can't KVM simply force a KVM_REQ_MASTERCLOCK_UPDATE request when a vCPU is added? I'm missing why this needs a persistent, periodic refresh. > >> @@ -157,6 +157,9 @@ module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR); > >> static bool __read_mostly kvmclock_periodic_sync = true; > >> module_param(kvmclock_periodic_sync, bool, S_IRUGO); > >> > >> +unsigned int __read_mostly masterclock_sync_period; > >> +module_param(masterclock_sync_period, uint, 0444); > > > > Can the mode be 0644 and allow it be changed at runtime? > > It can be RW. > > So far I just copy from kvmclock_periodic_sync as most code are from the > mechanism of kvmclock_periodic_sync. > > static bool __read_mostly kvmclock_periodic_sync = true; > module_param(kvmclock_periodic_sync, bool, S_IRUGO); Unless there's a very good reason for making it writable, I vote to keep it RO to simplify the code. > >> /* tsc tolerance in parts per million - default to 1/2 of the NTP threshold */ > >> static u32 __read_mostly tsc_tolerance_ppm = 250; > >> module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); > >> @@ -3298,6 +3301,31 @@ static void kvmclock_sync_fn(struct work_struct *work) > >> KVMCLOCK_SYNC_PERIOD); > >> } > >> > >> +static void masterclock_sync_fn(struct work_struct *work) > >> +{ > >> + unsigned long i; > >> + struct delayed_work *dwork = to_delayed_work(work); > >> + struct kvm_arch *ka = container_of(dwork, struct kvm_arch, > >> + masterclock_sync_work); > >> + struct kvm *kvm = container_of(ka, struct kvm, arch); > >> + struct kvm_vcpu *vcpu; > >> + > >> + if (!masterclock_sync_period) This function should never be called if masterclock_sync_period=0. The param is RO and so kvm_arch_vcpu_postcreate() shouldn't create the work in the first place. > >> + return; > >> + > >> + kvm_for_each_vcpu(i, vcpu, kvm) { > >> + /* > >> + * It is not required to kick the vcpu because it is not > >> + * expected to update the master clock immediately. > >> + */ This comment needs to explain *why* it's ok for vCPUs to lazily handle the masterclock update. Saying "it is not expected" doesn't help understand who/what expects anything, or why. > >> + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); > >> + } > >> + > >> + schedule_delayed_work(&ka->masterclock_sync_work, > >> + masterclock_sync_period * HZ); > >> +}