On Tue, Jul 23, 2024 at 07:59:22AM +0800, Sean Christopherson wrote: > Synthesize a consistency check VM-Exit (VM-Enter) or VM-Abort (VM-Exit) if > L1 attempts to load/store an MSR via the VMCS MSR lists that userspace has > disallowed access to via an MSR filter. Intel already disallows including > a handful of "special" MSRs in the VMCS lists, so denying access isn't > completely without precedent. > > More importantly, the behavior is well-defined _and_ can be communicated > the end user, e.g. to the customer that owns a VM running as L1 on top of > KVM. On the other hand, ignoring userspace MSR filters is all but > guaranteed to result in unexpected behavior as the access will hit KVM's > internal state, which is likely not up-to-date. > > Unlike KVM-internal accesses, instruction emulation, and dedicated VMCS > fields, the MSRs in the VMCS load/store lists are 100% guest controlled, > thus making it all but impossible to reason about the correctness of > ignoring the MSR filter. And if userspace *really* wants to deny access > to MSRs via the aforementioned scenarios, userspace can hide the > associated feature from the guest, e.g. by disabling the PMU to prevent > accessing PERF_GLOBAL_CTRL via its VMCS field. But for the MSR lists, KVM > is blindly processing MSRs; the MSR filters are the _only_ way for > userspace to deny access. > > This partially reverts commit ac8d6cad3c7b ("KVM: x86: Only do MSR > filtering when access MSR by rdmsr/wrmsr"). > > Cc: Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx> > Cc: Jim Mattson <jmattson@xxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > > I found this by inspection when backporting Hou's change to an internal kernel. > I don't love piggybacking Intel's "you can't touch these special MSRs" behavior, > but ignoring the userspace MSR filters is far worse IMO. E.g. if userspace is > denying access to an MSR in order to reduce KVM's attack surface, letting L1 > sneak in reads/writes through VM-Enter/VM-Exit completely circumvents the > filters. > > Documentation/virt/kvm/api.rst | 19 ++++++++++++++++--- > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/vmx/nested.c | 12 ++++++------ > arch/x86/kvm/x86.c | 6 ++++-- > 4 files changed, 28 insertions(+), 11 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 8e5dad80b337..e6b1e42186f3 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -4226,9 +4226,22 @@ filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes > an error. > > .. warning:: > - MSR accesses as part of nested VM-Enter/VM-Exit are not filtered. > - This includes both writes to individual VMCS fields and reads/writes > - through the MSR lists pointed to by the VMCS. > + MSR accesses that are side effects of instruction execution (emulated or > + native) are not filtered as hardware does not honor MSR bitmaps outside of > + RDMSR and WRMSR, and KVM mimics that behavior when emulating instructions > + to avoid pointless divergence from hardware. E.g. RDPID reads MSR_TSC_AUX, > + SYSENTER reads the SYSENTER MSRs, etc. > + > + MSRs that are loaded/stored via dedicated VMCS fields are not filtered as > + part of VM-Enter/VM-Exit emulation. > + > + MSRs that are loaded/store via VMX's load/store lists _are_ filtered as part > + of VM-Enter/VM-Exit emulation. If an MSR access is denied on VM-Enter, KVM > + synthesizes a consistency check VM-Exit(EXIT_REASON_MSR_LOAD_FAIL). If an > + MSR access is denied on VM-Exit, KVM synthesizes a VM-Abort. In short, KVM > + extends Intel's architectural list of MSRs that cannot be loaded/saved via > + the VM-Enter/VM-Exit MSR list. It is platform owner's responsibility to > + to communicate any such restrictions to their end users. > Do we also need to modify the statement before this warning? Since the behaviour is different from RDMSR/WRMSR emulation case. ``` if an MSR access is denied by userspace the resulting KVM behavior depends on whether or not KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER is enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace on denied accesses, i.e. userspace effectively intercepts the MSR access. ``` > x2APIC MSR accesses cannot be filtered (KVM silently ignores filters that > cover any x2APIC MSRs). > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 950a03e0181e..94d0bedc42ee 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -2059,6 +2059,8 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu); > > void kvm_enable_efer_bits(u64); > bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); > +int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data); > +int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data); > int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated); > int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data); > int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data); > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 2392a7ef254d..674f7089cc44 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -981,7 +981,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count) > __func__, i, e.index, e.reserved); > goto fail; > } > - if (kvm_set_msr(vcpu, e.index, e.value)) { > + if (kvm_set_msr_with_filter(vcpu, e.index, e.value)) { > pr_debug_ratelimited( > "%s cannot write MSR (%u, 0x%x, 0x%llx)\n", > __func__, i, e.index, e.value); > @@ -1017,7 +1017,7 @@ static bool nested_vmx_get_vmexit_msr_value(struct kvm_vcpu *vcpu, > } > } > > - if (kvm_get_msr(vcpu, msr_index, data)) { > + if (kvm_get_msr_with_filter(vcpu, msr_index, data)) { > pr_debug_ratelimited("%s cannot read MSR (0x%x)\n", __func__, > msr_index); > return false; > @@ -1112,9 +1112,9 @@ static void prepare_vmx_msr_autostore_list(struct kvm_vcpu *vcpu, > /* > * Emulated VMEntry does not fail here. Instead a less > * accurate value will be returned by > - * nested_vmx_get_vmexit_msr_value() using kvm_get_msr() > - * instead of reading the value from the vmcs02 VMExit > - * MSR-store area. > + * nested_vmx_get_vmexit_msr_value() by reading KVM's > + * internal MSR state instead of reading the value from > + * the vmcs02 VMExit MSR-store area. > */ > pr_warn_ratelimited( > "Not enough msr entries in msr_autostore. Can't add msr %x\n", > @@ -4806,7 +4806,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) > goto vmabort; > } > > - if (kvm_set_msr(vcpu, h.index, h.value)) { > + if (kvm_set_msr_with_filter(vcpu, h.index, h.value)) { > pr_debug_ratelimited( > "%s WRMSR failed (%u, 0x%x, 0x%llx)\n", > __func__, j, h.index, h.value); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index af6c8cf6a37a..7b3659a05c27 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1942,19 +1942,21 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu, > return ret; > } > > -static int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data) > +int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data) > { > if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) > return KVM_MSR_RET_FILTERED; > return kvm_get_msr_ignored_check(vcpu, index, data, false); > } > +EXPORT_SYMBOL_GPL(kvm_get_msr_with_filter); > > -static int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data) > +int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data) > { > if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) > return KVM_MSR_RET_FILTERED; > return kvm_set_msr_ignored_check(vcpu, index, data, false); > } > +EXPORT_SYMBOL_GPL(kvm_set_msr_with_filter); > > int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data) > { > > base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f > -- > 2.45.2.1089.g2a221341d9-goog