Re: [PATCH 3/3] x86/kvm: implement Hyper-V reference TSC page clock

Roman Kagan <rkagan@xxxxxxxxxxxxx> · Tue, 26 Apr 2016 12:02:23 +0300

On Mon, Apr 25, 2016 at 10:54:12PM +0200, Radim Krčmář wrote:
> 2016-04-21 20:11+0300, Roman Kagan:
> > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > @@ -797,23 +798,11 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
> >  		mark_page_dirty(kvm, gfn);
> >  		break;
> >  	}
> > +	case HV_X64_MSR_REFERENCE_TSC:
> 
> (Would be nicer to check for HV_X64_MSR_REFERENCE_TSC_AVAILABLE.)

Hmm, interesting point.  This is a jugdement call, whether we should
refuse processing this MSR if we didn't announce its support to the
guest in the respective cpuid leaf (I personally don't think so).  We
don't do it for a number of other MSRs, if we should then it probably
has to be a separate patch fixing all of them.

> >  		hv->hv_tsc_page = data;
> > +		if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
> > +			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
> 
> The MSR value is global and will be seen by other VCPUs before we write
> the page for the first time, which means there is an extremely unlikely
> race that could read random data from a guest page and interpret it as
> time.  Initialization before setting hv_tsc_page would be fine.

KVM_REQ_MASTERCLOCK_UPDATE will make sure the page has valid contents
before returning to the guest.  As for other VCPUs it's up to the guest
to synchronize access to the page with this VCPU; we can't prevent them
from reading it before we return to the guest.

> (Also, TLFS 4.0b says that the guest can pick any frame in the GPA
>  space.  The guest could specify a frame that wouldn't be mapped in KVM
>  and the guest would fail for no good reason.  HyperV's "overlay pages"
>  likely don't read or overwrite content of mapped frames either.
>  I think it would be safer to create a new mapping for the page ...)

I've never seen this happen; if this is really possible we'll have to do
more (e.g. the migration of the contents of this page won't happen
automatically).  I'll double-check with the spec, thanks.

> > @@ -1143,3 +1132,107 @@ set_result:
> > +static int pvclock_to_tscpage(struct pvclock_vcpu_time_info *hv_clock,
> > +			       HV_REFERENCE_TSC_PAGE *tsc_ref)
> > +{
> | [...]
> > +	 * tsc_scale = (tsc_to_system_mul << (tsc_shift + 32)) / 100
> | [...]
> > +	 * Note that although tsc_to_system_mul is 32 bit, we may need 128 bit
> > +	 * division to calculate tsc_scale
> 
> Linux has a function that handles u96/u64 with 64 and even 32 bit
> arithmetics, mul_u64_u32_div().  Do you think this would be ok?
> 
>   tsc_ref->tsc_scale = mul_u64_u32_div(1ULL << hv_clock->tsc_shift + 32,
>                                        tsc_to_system_mul, 100);

Indeed, thanks (shame on me for missing it in math64.h).

Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html