On Wed, Feb 03, 2016 at 05:37:08PM +0100, Paolo Bonzini wrote: > On 28/01/2016 17:22, Roman Kagan wrote: > > On Thu, Jan 28, 2016 at 03:04:58PM +0100, Paolo Bonzini wrote: > >> The test checks the relative precision of the reference TSC page > >> and the time reference counter. > >> > >> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > >> --- > >> Andrey, the test has a relative error of approximately 3 parts > >> per million on my machine. In other words it drifts by 3 > >> microseconds every second, which I don't think is acceptable. > >> The problem is that virtual_tsc_khz is 2593993 while the actual > >> frequency is more like 2594001 kHz. > > > > Hmm, how come? I thought virtual_tsc_khz could only diverge from > > tsc_khz on migration. > > > > [Or maybe I just misunderstand what you mean by "the actual frequency".] > > Talking to Marcelo helped me understanding it. :) > > The issue is that the TSC kHz is not the correct value to use for the > the TSC page. Instead, we want to pass to the guest a scale value > corresponding to the kernel's timekeeping multiplier. This is what > both kvmclock and the time reference counter use. > > The bit that I was missing last week was how the guest could perceive > updates to the TSC page parameters as atomic. We actually _can_ > emulate seqlock-like behavior in Hyper-V by writing 0 to seq during > an update. Instead of looping like seqlocks do, the guest will simply > use the time reference counter and take a hypervisor exit. The result > however is still valid, because we want the time reference counter to > be perfectly synchronized with the Hyper-V clock and lets you handle > any kvmclock update scenario safely. > > Therefore the algorithm would be like patch 2/2 I sent, but with > important differences. You'd still > > if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) > kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); > > on MSR writes, but then hook into kvm_guest_time_update rather than > kvm_gen_update_masterclock, like: > > if (v == kvm_get_vcpu(v->kvm, 0)) > kvm_write_hyperv_tsc_page(v->kvm, &vcpu->hv_clock); > > and in kvm_write_hyperv_tsc_page the calculation would be based on > the kvmclock parameters: > > { > write 0 to seq; > if (!(hv_clock->flags & PVCLOCK_TSC_STABLE_BIT)) > return; > > compute scale and offset from hv_clock mul and shift; > write scale and offset; > write sequence > } > > all of scale, offset and sequence can be computed from kvmclock parameters. > > For sequence we have to convert "even values between 0 and 0xFFFFFFFE" > into "values between "1 and 0xFFFFFFFE". For example: > > hyper-v sequence = (kvmclock sequence >> 1) + 1 > > will do it. Computing the scale and offset is something like: > > // KVM formula: > // nsec = (ticks - tsc_timestamp) * tsc_to_system_mul << tsc_shift + system_time > // > // hyper-v formula: > // nsec/100 = ticks * scale / 2^32 + offset > // > // when tsc_timestamp, offset is zero so remove them both: > // ticks * tsc_to_system_mul << tsc_shift / 100 = ticks * scale / 2^32 > // > // multiply both sides by 2^32 / ticks and you get scale: > // scale = tsc_to_system_mul << (32 + tsc_shift) / 100 > // > // check if it would overflow, and then use the time ref counter > // tsc_to_system_mul << (32 + tsc_shift) / 100 >= 2^32 > // tsc_to_system_mul << 32 >= 100 * 2^32 << -tsc_shift > // tsc_to_system_mul >= 100 << -tsc_shift > > if (shift < 0) > rhs = 100 << -shift; > else > rhs = 100 >> shift; > > if (tsc_to_system_mul >= rhs) > return; > > scale = mul_u64_u32_div(1ULL << (32 + tsc_shift), tsc_to_system_mul, 100); > > // now expand the kvmclock formula: > // nsec = ticks * tsc_to_system_mul << tsc_shift - (tsc_timestamp * tsc_to_system_mul << tsc_shift) + system_time > // divide by 100: > // nsec/100 = ticks * tsc_to_system_mul << tsc_shift /100 - (tsc_timestamp * tsc_to_system_mul << tsc_shift) /100 + system_time /100 > // replace tsc_to_system_mul << tsc_shift /100 by scale / 2^32: > // nsec/100 = ticks * scale / 2^32 - (tsc_timestamp * scale / 2^32) + system_time / 100 > // comparing with the Hyper-V formula: > // offset = system_time / 100 - (tsc_timestamp * scale / 2^32) > > offset = system_time; > do_div(offset, 100); > > timestamp_offset = tsc_timestamp; > do_div32_shl32(timestamp_offset, scale); > offset = offset - timestamp_offset; > > The remaining part is _further_ adjusting the offset to > Would you like to finish implementing this and test it with the unit test? > kvm/queue already has the patch to introduce do_div32_shl32. At last I've got some time to tackle this. I cooked up a patch to do approximately this (fixing the math and the logic in a few places), and adjusted the test to build, and now I'm about to submit both of them. As to the drift problem, I reliably reproduce it, too, but it's a bug somewhere in kvm_clock, and the new hyperv ref tsc page now demonstrates it by virtue of relying on kvm_clock's values. On my machine I observe as much as +14 ppm and I'll try to chase it down, but this is independent of the hyperv ref tsc page patchset. Roman. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html