Re: [PATCH kvm-unit-tests] KVM: x86: add hyperv clock test case

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 3 Feb 2016 17:37:08 +0100

On 28/01/2016 17:22, Roman Kagan wrote:
> On Thu, Jan 28, 2016 at 03:04:58PM +0100, Paolo Bonzini wrote:
>> The test checks the relative precision of the reference TSC page
>> and the time reference counter.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> ---
>> 	Andrey, the test has a relative error of approximately 3 parts
>> 	per million on my machine.  In other words it drifts by 3
>> 	microseconds every second, which I don't think is acceptable.
>> 	The problem is that virtual_tsc_khz is 2593993 while the actual
>> 	frequency is more like 2594001 kHz.
> 
> Hmm, how come?  I thought virtual_tsc_khz could only diverge from
> tsc_khz on migration.
> 
> [Or maybe I just misunderstand what you mean by "the actual frequency".]

Talking to Marcelo helped me understanding it. :)

The issue is that the TSC kHz is not the correct value to use for the  
the TSC page.  Instead, we want to pass to the guest a scale value 
corresponding to the kernel's timekeeping multiplier.  This is what 
both kvmclock and the time reference counter use.

The bit that I was missing last week was how the guest could perceive
updates to the TSC page parameters as atomic.  We actually _can_
emulate seqlock-like behavior in Hyper-V by writing 0 to seq during
an update.  Instead of looping like seqlocks do, the guest will simply
use the time reference counter and take a hypervisor exit.  The result
however is still valid, because we want the time reference counter to
be perfectly synchronized with the Hyper-V clock and lets you handle
any kvmclock update scenario safely.

Therefore the algorithm would be like patch 2/2 I sent, but with
important differences.  You'd still

	if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);

on MSR writes, but then hook into kvm_guest_time_update rather than
kvm_gen_update_masterclock, like:

	if (v == kvm_get_vcpu(v->kvm, 0))
	        kvm_write_hyperv_tsc_page(v->kvm, &vcpu->hv_clock);

and in kvm_write_hyperv_tsc_page the calculation would be based on
the kvmclock parameters:

        {
                write 0 to seq;
                if (!(hv_clock->flags & PVCLOCK_TSC_STABLE_BIT))
                        return;

                compute scale and offset from hv_clock mul and shift;
                write scale and offset;
		write sequence
        }

all of scale, offset and sequence can be computed from kvmclock parameters.

For sequence we have to convert "even values between 0 and 0xFFFFFFFE"
into "values between "1 and 0xFFFFFFFE".  For example:

	hyper-v sequence = (kvmclock sequence >> 1) + 1

will do it.  Computing the scale and offset is something like:

        // KVM formula:
	//    nsec = (ticks - tsc_timestamp) * tsc_to_system_mul << tsc_shift + system_time
	//
	// hyper-v formula:
	//    nsec/100 = ticks * scale / 2^32 + offset
	//
	// when tsc_timestamp, offset is zero so remove them both:
        //    ticks * tsc_to_system_mul << tsc_shift / 100 = ticks * scale / 2^32
	//
	// multiply both sides by 2^32 / ticks and you get scale:
	//    scale = tsc_to_system_mul << (32 + tsc_shift) / 100
	//
	// check if it would overflow, and then use the time ref counter
	//    tsc_to_system_mul << (32 + tsc_shift) / 100 >= 2^32
	//    tsc_to_system_mul << 32 >= 100 * 2^32 << -tsc_shift
	//    tsc_to_system_mul >= 100 << -tsc_shift

	if (shift < 0)
		rhs = 100 << -shift;
	else
		rhs = 100 >> shift;

	if (tsc_to_system_mul >= rhs)
		return;

        scale = mul_u64_u32_div(1ULL << (32 + tsc_shift), tsc_to_system_mul, 100);

        // now expand the kvmclock formula:
        //    nsec = ticks * tsc_to_system_mul << tsc_shift - (tsc_timestamp * tsc_to_system_mul << tsc_shift) + system_time 
	// divide by 100:
	//    nsec/100 = ticks * tsc_to_system_mul << tsc_shift /100 - (tsc_timestamp * tsc_to_system_mul << tsc_shift) /100 + system_time /100
	// replace tsc_to_system_mul << tsc_shift /100 by scale / 2^32:
        //    nsec/100 = ticks * scale / 2^32 - (tsc_timestamp * scale / 2^32) + system_time / 100
	// comparing with the Hyper-V formula:
	//    offset = system_time / 100 - (tsc_timestamp * scale / 2^32)

	offset = system_time;
	do_div(offset, 100);

	timestamp_offset = tsc_timestamp;
	do_div32_shl32(timestamp_offset, scale);
        offset = offset - timestamp_offset;

The remaining part is _further_ adjusting the offset to
Would you like to finish implementing this and test it with the unit test?
kvm/queue already has the patch to introduce do_div32_shl32.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html