On Thu, 2021-09-16 at 18:15 +0000, Oliver Upton wrote: > > @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) > * is slightly ahead) here we risk going negative on unsigned > * 'system_time' when 'data.clock' is very small. > */ > - if (kvm->arch.use_master_clock) > - now_ns = ka->master_kernel_ns; > + if (data.flags & KVM_CLOCK_REALTIME) { > + u64 now_real_ns = ktime_get_real_ns(); > + > + /* > + * Avoid stepping the kvmclock backwards. > + */ > + if (now_real_ns > data.realtime) > + data.clock += now_real_ns - data.realtime; > + } > + > + if (ka->use_master_clock) > + now_raw_ns = ka->master_kernel_ns; This looks wrong to me. > else > - now_ns = get_kvmclock_base_ns(); > - ka->kvmclock_offset = data.clock - now_ns; > + now_raw_ns = get_kvmclock_base_ns(); > + ka->kvmclock_offset = data.clock - now_raw_ns; > kvm_end_pvclock_update(kvm); > return 0; > } We use the host CLOCK_MONOTONIC_RAW plus the boot offset, as a 'kvmclock base clock', and get_kvmclock_base_ns() returns that. The KVM clocks for each VMs are based on this 'kvmclock base clock', each offset by a ka->kvmclock_offset which represents the time at which that VM was started — so each VM's clock starts from zero. The values of ka->master_kernel_ns and ka->master_cycle_now represent a single point in time, the former being the value of get_kvmclock_base_ns() at that moment and the latter being the host TSC value. In pvclock_update_vm_gtod_copy(), kvm_get_time_and_clockread() is used to return both values at precisely the same moment, from the *same* rdtsc(). This allows the current 'kvmclock base clock' to be calculated at any moment by reading the TSC, calculating a delta to that reading from ka->master_cycle_now to determine how much time has elapsed since ka->master_kernel_ns. We can then add ka->kvmclock_offset to get the kvmclock for this particular VM. Now, looking at the code quoted above. It's given a kvm_clock_data struct which contains a value of the KVM clock which is to be set as the time "now", and all it does is adjust ka->kvmclock_offset accordingly. Which is really simple: now_raw_ns = get_kvmclock_base_ns(); ka->kvmclock_offset = data.clock - now_raw_ns; Et voilà, now get_kvmclock_base_ns() + ka->kvmclock_offset at any given moment in time will result in a kvmclock value according to what was just set. Yay! Except... in the case where the TSC is constant, we actually set 'now_raw_ns' to a value that doesn't represent *now*. Instead, we set it to ka->master_kernel_ns which represents some point in the *past*. We should add the number of TSC ticks since ka->master_cycle_now if we're going to use that, surely?
Attachment:
smime.p7s
Description: S/MIME cryptographic signature