On 28/01/2016 17:22, Roman Kagan wrote: > On Thu, Jan 28, 2016 at 03:04:58PM +0100, Paolo Bonzini wrote: >> The test checks the relative precision of the reference TSC page >> and the time reference counter. >> >> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> >> --- >> Andrey, the test has a relative error of approximately 3 parts >> per million on my machine. In other words it drifts by 3 >> microseconds every second, which I don't think is acceptable. >> The problem is that virtual_tsc_khz is 2593993 while the actual >> frequency is more like 2594001 kHz. > > Hmm, how come? I thought virtual_tsc_khz could only diverge from > tsc_khz on migration. > > [Or maybe I just misunderstand what you mean by "the actual frequency".] Talking to Marcelo helped me understanding it. :) The issue is that the TSC kHz is not the correct value to use for the the TSC page. Instead, we want to pass to the guest a scale value corresponding to the kernel's timekeeping multiplier. This is what both kvmclock and the time reference counter use. The bit that I was missing last week was how the guest could perceive updates to the TSC page parameters as atomic. We actually _can_ emulate seqlock-like behavior in Hyper-V by writing 0 to seq during an update. Instead of looping like seqlocks do, the guest will simply use the time reference counter and take a hypervisor exit. The result however is still valid, because we want the time reference counter to be perfectly synchronized with the Hyper-V clock and lets you handle any kvmclock update scenario safely. Therefore the algorithm would be like patch 2/2 I sent, but with important differences. You'd still if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); on MSR writes, but then hook into kvm_guest_time_update rather than kvm_gen_update_masterclock, like: if (v == kvm_get_vcpu(v->kvm, 0)) kvm_write_hyperv_tsc_page(v->kvm, &vcpu->hv_clock); and in kvm_write_hyperv_tsc_page the calculation would be based on the kvmclock parameters: { write 0 to seq; if (!(hv_clock->flags & PVCLOCK_TSC_STABLE_BIT)) return; compute scale and offset from hv_clock mul and shift; write scale and offset; write sequence } all of scale, offset and sequence can be computed from kvmclock parameters. For sequence we have to convert "even values between 0 and 0xFFFFFFFE" into "values between "1 and 0xFFFFFFFE". For example: hyper-v sequence = (kvmclock sequence >> 1) + 1 will do it. Computing the scale and offset is something like: // KVM formula: // nsec = (ticks - tsc_timestamp) * tsc_to_system_mul << tsc_shift + system_time // // hyper-v formula: // nsec/100 = ticks * scale / 2^32 + offset // // when tsc_timestamp, offset is zero so remove them both: // ticks * tsc_to_system_mul << tsc_shift / 100 = ticks * scale / 2^32 // // multiply both sides by 2^32 / ticks and you get scale: // scale = tsc_to_system_mul << (32 + tsc_shift) / 100 // // check if it would overflow, and then use the time ref counter // tsc_to_system_mul << (32 + tsc_shift) / 100 >= 2^32 // tsc_to_system_mul << 32 >= 100 * 2^32 << -tsc_shift // tsc_to_system_mul >= 100 << -tsc_shift if (shift < 0) rhs = 100 << -shift; else rhs = 100 >> shift; if (tsc_to_system_mul >= rhs) return; scale = mul_u64_u32_div(1ULL << (32 + tsc_shift), tsc_to_system_mul, 100); // now expand the kvmclock formula: // nsec = ticks * tsc_to_system_mul << tsc_shift - (tsc_timestamp * tsc_to_system_mul << tsc_shift) + system_time // divide by 100: // nsec/100 = ticks * tsc_to_system_mul << tsc_shift /100 - (tsc_timestamp * tsc_to_system_mul << tsc_shift) /100 + system_time /100 // replace tsc_to_system_mul << tsc_shift /100 by scale / 2^32: // nsec/100 = ticks * scale / 2^32 - (tsc_timestamp * scale / 2^32) + system_time / 100 // comparing with the Hyper-V formula: // offset = system_time / 100 - (tsc_timestamp * scale / 2^32) offset = system_time; do_div(offset, 100); timestamp_offset = tsc_timestamp; do_div32_shl32(timestamp_offset, scale); offset = offset - timestamp_offset; The remaining part is _further_ adjusting the offset to Would you like to finish implementing this and test it with the unit test? kvm/queue already has the patch to introduce do_div32_shl32. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html