Re: [PATCH 1/5] KVM: x86/xen: Restrict hypercall MSR to unofficial synthetic range

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Wed, 05 Feb 2025 15:26:54 +0000

On Wed, 2025-02-05 at 07:06 -0800, Sean Christopherson wrote:
> On Wed, Feb 05, 2025, David Woodhouse wrote:
> > On Fri, 2025-01-31 at 17:13 -0800, Sean Christopherson wrote:
> > > --- a/arch/x86/kvm/xen.c
> > > +++ b/arch/x86/kvm/xen.c
> > > @@ -1324,6 +1324,14 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
> > >  	     xhc->blob_size_32 || xhc->blob_size_64))
> > >  		return -EINVAL;
> > >  
> > > +	/*
> > > +	 * Restrict the MSR to the range that is unofficially reserved for
> > > +	 * synthetic, virtualization-defined MSRs, e.g. to prevent confusing
> > > +	 * KVM by colliding with a real MSR that requires special handling.
> > > +	 */
> > > +	if (xhc->msr && (xhc->msr < 0x40000000 || xhc->msr > 0x4fffffff))
> > > +		return -EINVAL;
> > > +
> > >  	mutex_lock(&kvm->arch.xen.xen_lock);
> > >  
> > >  	if (xhc->msr && !kvm->arch.xen_hvm_config.msr)
> > 
> > I'd prefer to see #defines for those magic values.
> 
> Can do.  Hmm, and since this would be visible to userspace, arguably the #defines
> should go in arch/x86/include/uapi/asm/kvm.h

Thanks.

> > Especially as there is a corresponding requirement that they never be set
> > from host context (which is where the potential locking issues come in).
> > Which train of thought leads me to ponder this as an alternative (or
> > additional) solution:
> > 
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3733,7 +3733,13 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >         u32 msr = msr_info->index;
> >         u64 data = msr_info->data;
> >  
> > -       if (msr && msr == vcpu->kvm->arch.xen_hvm_config.msr)
> > +       /*
> > +        * Do not allow host-initiated writes to trigger the Xen hypercall
> > +        * page setup; it could incur locking paths which are not expected
> > +        * if userspace sets the MSR in an unusual location.
> 
> That's just as likely to break userspace.  Doing a save/restore on the MSR doesn't
> make a whole lot of sense since it's effectively a "command" MSR, but IMO it's not
> any less likely than userspace putting the MSR index outside of the synthetic range.

Save/restore on the MSR makes no sense. It's a write-only MSR; writing
to it has no effect *other* than populating the target page. In KVM we
don't implement reading from it at all; I don't think Xen does either?

And even if it was readable and would rather pointlessly return the
last value written to it, save/restore arguably shouldn't actually
trigger the guest memory to be overwritten again. The hypercall page
should only be populated when the *guest* writes the MSR.

With the recent elimination of the hypercall page from Linux Xen
guests, we've suggested that Linux should still set up the hypercall
page early (as it *does* have the side-effect of letting Xen know that
the guest is 64-bit). And then just free the page without ever using
it. We absolutely would not want a save/restore to scribble on that
page again.

I'm absolutely not worried about breaking userspace with such a change
to make the hypercall page MSR only work when !host_initiated. In fact
I think it's probably the right thing to do *anyway*.

If userspace wants to write to guest memory, it can do that anyway; it
doesn't need to ask the *kernel* to do it.

> Side topic, upstream QEMU doesn't even appear to put the MSR at the Hyper-V
> address.  It tells the guest that's where the MSR is located, but the config
> passed to KVM still uses the default.
> 
>         /* Hypercall MSR base address */
>         if (hyperv_enabled(cpu)) {
>             c->ebx = XEN_HYPERCALL_MSR_HYPERV;
>             kvm_xen_init(cs->kvm_state, c->ebx);
>         } else {
>             c->ebx = XEN_HYPERCALL_MSR;
>         }
> 
> ...
> 
>         /* hyperv_enabled() doesn't work yet. */
>         uint32_t msr = XEN_HYPERCALL_MSR;
>         ret = kvm_xen_init(s, msr);
>         if (ret < 0) {
>             return ret;
>         }
> 

Those two happen in reverse chronological order, don't they? And in the
lower one the comment tells you that hyperv_enabled() doesn't work yet.
When the higher one is called later, it calls kvm_xen_init() *again* to
put the MSR in the right place.

It could be prettier, but I don't think it's broken, is it?

> Userspace breakage aside, disallowng host writes would fix the immediate issue,
> and I think would mitigate all concerns with putting the host at risk.  But it's
> not enough to actually make an overlapping MSR index work.  E.g. if the MSR is
> passed through to the guest, the write will go through to the hardware MSR, unless
> the WRMSR happens to be emulated.
> 
> I really don't want to broadly support redirecting any MSR, because to truly go
> down that path we'd need to deal with x2APIC, EFER, and other MSRs that have
> special treatment and meaning.
> 
> While KVM's stance is usually that a misconfigured vCPU model is userspace's
> problem, in this case I don't see any value in letting userspace be stupid.  It
> can't work generally, it creates unique ABI for KVM_SET_MSRS, and unless there's
> a crazy use case I'm overlooking, there's no sane reason for userspace to put the
> index in outside of the synthetic range (whereas defining seemingly nonsensical
> CPUID feature bits is useful for testing purposes, implementing support in
> userspace, etc).

Right, I think we should do *both*. Blocking host writes solves the
issue of locking problems with the hypercall page setup. All it would
take for that issue to recur is for us (or Microsoft) to invent a new
MSR in the synthetic range which is also written on vCPU init/reset.
And then the sanity check on where the VMM puts the Xen MSR doesn't
save us.

But yes, we should *also* do that sanity check.

Attachment:
smime.p7s

Description: S/MIME cryptographic signature