On Fri, Nov 22, 2019 at 5:49 PM Liran Alon <liran.alon@xxxxxxxxxx> wrote: > > > > > On 23 Nov 2019, at 2:22, Jim Mattson <jmattson@xxxxxxxxxx> wrote: > > > > On Fri, Nov 22, 2019 at 3:57 PM Liran Alon <liran.alon@xxxxxxxxxx> wrote: > >> > >> > >>> On 23 Nov 2019, at 1:43, Jim Mattson <jmattson@xxxxxxxxxx> wrote: > >>> > >>> Commit 37e4c997dadf ("KVM: VMX: validate individual bits of guest > >>> MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting > >>> new constraints on the data values that kvm would accept for the guest > >>> MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been > >>> opt-in via a new KVM capability, but they were applied > >>> indiscriminately, breaking at least one existing hypervisor. > >>> > >>> Relax the constraints to allow either or both of > >>> FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and > >>> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is > >>> enabled. This change is sufficient to fix the aforementioned breakage. > >>> > >>> Fixes: 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL") > >>> Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx> > >> > >> I suggest to also add a comment in code to clarify why we allow setting > >> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX even though we expose a vCPU that doesn’t support Intel TXT. > >> (I think the compatibility to existing workloads that sets this blindly on boot is a legit reason. Just recommend documenting it.) > >> > >> In addition, if the nested hypervisor which relies on this is public, please also mention it in commit message for reference. > > > > It's not an L1 hypervisor that's the problem. It's Google's L0 > > hypervisor. We've been incorrectly reporting IA32_FEATURE_CONTROL as 7 > > to nested guests for years, and now we have thousands of running VMs > > with the bogus value. I've thought about just changing it to 5 on the > > fly (on real hardware, one could almost blame it on SMM, but the MSR > > is *locked*, after all). > > If I understand correctly, you are talking about the case a VM is already running and you will > perform Live-Migration on it to a new host with a new kernel that it’s KVM don’t allow to > set FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX. > > If we haven’t encountered yet some guest workload that blindly sets this bit in IA32_FEATURE_CONTROL, > then it should be sufficient for you to modify vmx_set_msr() to allow setting this bit in case msr_info->host_initiated > (i.e. During restore of VM state on destination) but disallow it when WRMSR is initiated from guest. > The behaviour of whether vmx_set_msr() allows host to set this MSR to unsupported can even be gated by an opt-in KVM cap > that will be set by Google’s userspace VMM. I would call that an opt-out cap, rather than an opt-in cap. That is, we'd like to opt-out from the ABI change. I was going to go ahead and do it, but it looks like Paolo has accepted the change as it is. Thanks, Paolo! > That way, you won’t change change guest runtime semantics (it’s original locked MSR value will be preserved during Live-Migration), > and you will disallow newly provisioned guests from setting IA32_FEATURE_CONTROL to an unsupported value. > > What do you think? > > -Liran > > > > >