On Wed, 2025-02-05 at 07:06 -0800, Sean Christopherson wrote: > On Wed, Feb 05, 2025, David Woodhouse wrote: > > On Fri, 2025-01-31 at 17:13 -0800, Sean Christopherson wrote: > > > --- a/arch/x86/kvm/xen.c > > > +++ b/arch/x86/kvm/xen.c > > > @@ -1324,6 +1324,14 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) > > > xhc->blob_size_32 || xhc->blob_size_64)) > > > return -EINVAL; > > > > > > + /* > > > + * Restrict the MSR to the range that is unofficially reserved for > > > + * synthetic, virtualization-defined MSRs, e.g. to prevent confusing > > > + * KVM by colliding with a real MSR that requires special handling. > > > + */ > > > + if (xhc->msr && (xhc->msr < 0x40000000 || xhc->msr > 0x4fffffff)) > > > + return -EINVAL; > > > + > > > mutex_lock(&kvm->arch.xen.xen_lock); > > > > > > if (xhc->msr && !kvm->arch.xen_hvm_config.msr) > > > > I'd prefer to see #defines for those magic values. > > Can do. Hmm, and since this would be visible to userspace, arguably the #defines > should go in arch/x86/include/uapi/asm/kvm.h Thanks. > > Especially as there is a corresponding requirement that they never be set > > from host context (which is where the potential locking issues come in). > > Which train of thought leads me to ponder this as an alternative (or > > additional) solution: > > > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -3733,7 +3733,13 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > u32 msr = msr_info->index; > > u64 data = msr_info->data; > > > > - if (msr && msr == vcpu->kvm->arch.xen_hvm_config.msr) > > + /* > > + * Do not allow host-initiated writes to trigger the Xen hypercall > > + * page setup; it could incur locking paths which are not expected > > + * if userspace sets the MSR in an unusual location. > > That's just as likely to break userspace. Doing a save/restore on the MSR doesn't > make a whole lot of sense since it's effectively a "command" MSR, but IMO it's not > any less likely than userspace putting the MSR index outside of the synthetic range. Save/restore on the MSR makes no sense. It's a write-only MSR; writing to it has no effect *other* than populating the target page. In KVM we don't implement reading from it at all; I don't think Xen does either? And even if it was readable and would rather pointlessly return the last value written to it, save/restore arguably shouldn't actually trigger the guest memory to be overwritten again. The hypercall page should only be populated when the *guest* writes the MSR. With the recent elimination of the hypercall page from Linux Xen guests, we've suggested that Linux should still set up the hypercall page early (as it *does* have the side-effect of letting Xen know that the guest is 64-bit). And then just free the page without ever using it. We absolutely would not want a save/restore to scribble on that page again. I'm absolutely not worried about breaking userspace with such a change to make the hypercall page MSR only work when !host_initiated. In fact I think it's probably the right thing to do *anyway*. If userspace wants to write to guest memory, it can do that anyway; it doesn't need to ask the *kernel* to do it. > Side topic, upstream QEMU doesn't even appear to put the MSR at the Hyper-V > address. It tells the guest that's where the MSR is located, but the config > passed to KVM still uses the default. > > /* Hypercall MSR base address */ > if (hyperv_enabled(cpu)) { > c->ebx = XEN_HYPERCALL_MSR_HYPERV; > kvm_xen_init(cs->kvm_state, c->ebx); > } else { > c->ebx = XEN_HYPERCALL_MSR; > } > > ... > > /* hyperv_enabled() doesn't work yet. */ > uint32_t msr = XEN_HYPERCALL_MSR; > ret = kvm_xen_init(s, msr); > if (ret < 0) { > return ret; > } > Those two happen in reverse chronological order, don't they? And in the lower one the comment tells you that hyperv_enabled() doesn't work yet. When the higher one is called later, it calls kvm_xen_init() *again* to put the MSR in the right place. It could be prettier, but I don't think it's broken, is it? > Userspace breakage aside, disallowng host writes would fix the immediate issue, > and I think would mitigate all concerns with putting the host at risk. But it's > not enough to actually make an overlapping MSR index work. E.g. if the MSR is > passed through to the guest, the write will go through to the hardware MSR, unless > the WRMSR happens to be emulated. > > I really don't want to broadly support redirecting any MSR, because to truly go > down that path we'd need to deal with x2APIC, EFER, and other MSRs that have > special treatment and meaning. > > While KVM's stance is usually that a misconfigured vCPU model is userspace's > problem, in this case I don't see any value in letting userspace be stupid. It > can't work generally, it creates unique ABI for KVM_SET_MSRS, and unless there's > a crazy use case I'm overlooking, there's no sane reason for userspace to put the > index in outside of the synthetic range (whereas defining seemingly nonsensical > CPUID feature bits is useful for testing purposes, implementing support in > userspace, etc). Right, I think we should do *both*. Blocking host writes solves the issue of locking problems with the hypercall page setup. All it would take for that issue to recur is for us (or Microsoft) to invent a new MSR in the synthetic range which is also written on vCPU init/reset. And then the sanity check on where the VMM puts the Xen MSR doesn't save us. But yes, we should *also* do that sanity check.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature