On Mon, Jun 26, 2023, Aaron Lewis wrote: > As a separate issue, shouldn't we restrict the MSR filter from being > able to intercept MSRs handled by the fast path? I see that we do > that for the APIC MSRs, but if MSR_IA32_TSC_DEADLINE is handled by the > fast path, I don't see a way for userspace to override that behavior. > So maybe it shouldn't? E.g. > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 439312e04384..dd0a314da0a3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1787,7 +1787,7 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 > index, u32 type) > u32 i; > > /* x2APIC MSRs do not support filtering. */ > - if (index >= 0x800 && index <= 0x8ff) > + if (index >= 0x800 && index <= 0x8ff || index == MSR_IA32_TSC_DEADLINE) > return true; > > idx = srcu_read_lock(&kvm->srcu); Yeah, I saw that flaw too :-/ I'm not entirely sure what to do about MSRs that can be handled in the fastpath. On one hand, intercepting those MSRs probably doesn't make much sense. On the other hand, the MSR filter needs to be uABI, i.e. we can't make the statement "MSRs handled in KVM's fastpath can't be filtered", because either every new fastpath MSRs will potentially break userspace, or KVM will be severely limited with respect to what can be handled in the fastpath. >From an ABI perspective, the easiest thing is to fix the bug and enforce any filter that affects MSR_IA32_TSC_DEADLINE. If we ignore performance, the fix is trivial. E.g. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5f220c04624e..3ef903bb78ce 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2174,6 +2174,9 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) kvm_vcpu_srcu_read_lock(vcpu); + if (!kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) + goto out; + switch (msr) { case APIC_BASE_MSR + (APIC_ICR >> 4): data = kvm_read_edx_eax(vcpu); @@ -2196,6 +2199,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) if (ret != EXIT_FASTPATH_NONE) trace_kvm_msr_write(msr, data); +out: kvm_vcpu_srcu_read_unlock(vcpu); return ret; But I don't love the idea of searching through the filters for an MSR that is pretty much guaranteed to be allowed. Since x2APIC MSRs can't be filtered, we could add a per-vCPU flag to track if writes to TSC_DEADLINE are allowed, i.e. if TSC_DEADLINE can be handled in the fastpath. However, at some point Intel and/or AMD will (hopefully) add support for full virtualization of TSC_DEADLINE, and then TSC_DEADLINE will be in the same boat as the x2APIC MSRs, i.e. allowing userspace to filter TSC_DEADLINE when it's fully virtualized would be nonsensical. And depending on how hardware behaves, i.e. how a virtual TSC_DEADLINE interacts with the MSR bitmaps, *enforcing* userspace's filtering might require a small amount of additional complexity. And any MSR that is performance sensitive enough to be handled in the fastpath is probably worth virtualizing in hardware, i.e. we'll end up revisiting this topic every time we add an MSR to the fastpath :-( I'm struggling to come up with an idea that won't create an ABI nightmare, won't be subject to the whims of AMD and Intel, and won't saddle KVM with complexity to support behavior that in all likelihood no one wants. I'm leaning toward enforcing the filter for TSC_DEADLINE, and crossing my fingers that neither AMD nor Intel implements TSC_DEADLINE virtualization in such a way that it changes the behavior of WRMSR interception.