Re: [PATCH kvm-unit-tests] KVM: x86: add hyperv clock test case

Roman Kagan <rkagan@xxxxxxxxxxxxx> · Thu, 4 Feb 2016 12:33:26 +0300

On Wed, Feb 03, 2016 at 05:37:08PM +0100, Paolo Bonzini wrote:
> On 28/01/2016 17:22, Roman Kagan wrote:
> > On Thu, Jan 28, 2016 at 03:04:58PM +0100, Paolo Bonzini wrote:
> >> The test checks the relative precision of the reference TSC page
> >> and the time reference counter.
> >>
> >> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> >> ---
> >> 	Andrey, the test has a relative error of approximately 3 parts
> >> 	per million on my machine.  In other words it drifts by 3
> >> 	microseconds every second, which I don't think is acceptable.
> >> 	The problem is that virtual_tsc_khz is 2593993 while the actual
> >> 	frequency is more like 2594001 kHz.
> > 
> > Hmm, how come?  I thought virtual_tsc_khz could only diverge from
> > tsc_khz on migration.
> > 
> > [Or maybe I just misunderstand what you mean by "the actual frequency".]
> 
> Talking to Marcelo helped me understanding it. :)
> 
> The issue is that the TSC kHz is not the correct value to use for the  
> the TSC page.  Instead, we want to pass to the guest a scale value 
> corresponding to the kernel's timekeeping multiplier.  This is what 
> both kvmclock and the time reference counter use.

Indeed.

> The bit that I was missing last week was how the guest could perceive
> updates to the TSC page parameters as atomic.  We actually _can_
> emulate seqlock-like behavior in Hyper-V by writing 0 to seq during
> an update.  Instead of looping like seqlocks do, the guest will simply
> use the time reference counter and take a hypervisor exit.  The result
> however is still valid, because we want the time reference counter to
> be perfectly synchronized with the Hyper-V clock and lets you handle
> any kvmclock update scenario safely.

That's smart, thanks for the idea!  We'll only need to check that
Windows goes through all these steps on every clock read, rather than
remembering the failure and sticking with the MSR for good.

> Therefore the algorithm would be like patch 2/2 I sent, but with
> important differences.  You'd still
> 
> 	if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
> 		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
> 
> on MSR writes, but then hook into kvm_guest_time_update rather than
> kvm_gen_update_masterclock, like:
> 
> 	if (v == kvm_get_vcpu(v->kvm, 0))
> 	        kvm_write_hyperv_tsc_page(v->kvm, &vcpu->hv_clock);
> 
> and in kvm_write_hyperv_tsc_page the calculation would be based on
> the kvmclock parameters:
> 
>         {
>                 write 0 to seq;
>                 if (!(hv_clock->flags & PVCLOCK_TSC_STABLE_BIT))
>                         return;
> 
>                 compute scale and offset from hv_clock mul and shift;
>                 write scale and offset;
> 		write sequence
>         }
> 
> all of scale, offset and sequence can be computed from kvmclock parameters.
> 
> For sequence we have to convert "even values between 0 and 0xFFFFFFFE"
> into "values between "1 and 0xFFFFFFFE".  For example:
> 
> 	hyper-v sequence = (kvmclock sequence >> 1) + 1
> 
> will do it.  Computing the scale and offset is something like:
> 
>         // KVM formula:
> 	//    nsec = (ticks - tsc_timestamp) * tsc_to_system_mul << tsc_shift + system_time
> 	//
> 	// hyper-v formula:
> 	//    nsec/100 = ticks * scale / 2^32 + offset
> 	//
> 	// when tsc_timestamp, offset is zero so remove them both:
>         //    ticks * tsc_to_system_mul << tsc_shift / 100 = ticks * scale / 2^32
> 	//
> 	// multiply both sides by 2^32 / ticks and you get scale:
> 	//    scale = tsc_to_system_mul << (32 + tsc_shift) / 100
> 	//
> 	// check if it would overflow, and then use the time ref counter
> 	//    tsc_to_system_mul << (32 + tsc_shift) / 100 >= 2^32
> 	//    tsc_to_system_mul << 32 >= 100 * 2^32 << -tsc_shift
> 	//    tsc_to_system_mul >= 100 << -tsc_shift
> 
> 	if (shift < 0)
> 		rhs = 100 << -shift;
> 	else
> 		rhs = 100 >> shift;
> 
> 	if (tsc_to_system_mul >= rhs)
> 		return;
> 
>         scale = mul_u64_u32_div(1ULL << (32 + tsc_shift), tsc_to_system_mul, 100);
> 
>         // now expand the kvmclock formula:
>         //    nsec = ticks * tsc_to_system_mul << tsc_shift - (tsc_timestamp * tsc_to_system_mul << tsc_shift) + system_time 
> 	// divide by 100:
> 	//    nsec/100 = ticks * tsc_to_system_mul << tsc_shift /100 - (tsc_timestamp * tsc_to_system_mul << tsc_shift) /100 + system_time /100
> 	// replace tsc_to_system_mul << tsc_shift /100 by scale / 2^32:
>         //    nsec/100 = ticks * scale / 2^32 - (tsc_timestamp * scale / 2^32) + system_time / 100
> 	// comparing with the Hyper-V formula:
> 	//    offset = system_time / 100 - (tsc_timestamp * scale / 2^32)
> 
> 	offset = system_time;
> 	do_div(offset, 100);
> 
> 	timestamp_offset = tsc_timestamp;
> 	do_div32_shl32(timestamp_offset, scale);
>         offset = offset - timestamp_offset;
> 
> The remaining part is _further_ adjusting the offset to

...?

> Would you like to finish implementing this and test it with the unit test?
> kvm/queue already has the patch to introduce do_div32_shl32.

We will but probably not at the top priority; our main focus ATM is
getting Hyper-V VMBus working.

Thanks,
Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html