On Sat, 2023-01-07 at 01:10 +0000, Sean Christopherson wrote: > Don't clear the "read" bits for x2APIC registers above SELF_IPI (APIC regs > 0x400 - 0xff0, MSRs 0x840 - 0x8ff). KVM doesn't emulate registers in that > space (there are a smattering of AMD-only extensions) and so should > intercept reads in order to inject #GP. When APICv is fully enabled, > Intel hardware doesn't validate the registers on RDMSR and instead blindly > retrieves data from the vAPIC page, i.e. it's software's responsibility to > intercept reads to non-existent MSRs. > > Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support") > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/vmx.c | 38 ++++++++++++++++++++------------------ > 1 file changed, 20 insertions(+), 18 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index c788aa382611..82c61c16f8f5 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -4018,26 +4018,17 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type) > vmx_set_msr_bitmap_write(msr_bitmap, msr); > } > > -static void vmx_reset_x2apic_msrs(struct kvm_vcpu *vcpu, u8 mode) > -{ > - unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap; > - unsigned long read_intercept; > - int msr; > - > - read_intercept = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0; > - > - for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { > - unsigned int read_idx = msr / BITS_PER_LONG; > - unsigned int write_idx = read_idx + (0x800 / sizeof(long)); > - > - msr_bitmap[read_idx] = read_intercept; > - msr_bitmap[write_idx] = ~0ul; > - } > -} > - > static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu) > { > + /* > + * x2APIC indices for 64-bit accesses into the RDMSR and WRMSR halves > + * of the MSR bitmap. KVM emulates APIC registers up through 0x3f0, > + * i.e. MSR 0x83f, and so only needs to dynamically manipulate 64 bits. > + */ The above comment is better to be placed down below, near the actual write, otherwise it is confusing. > + const int read_idx = APIC_BASE_MSR / BITS_PER_LONG_LONG; > + const int write_idx = read_idx + (0x800 / sizeof(u64)); > struct vcpu_vmx *vmx = to_vmx(vcpu); > + u64 *msr_bitmap = (u64 *)vmx->vmcs01.msr_bitmap; > u8 mode; > > if (!cpu_has_vmx_msr_bitmap()) > @@ -4058,7 +4049,18 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu) > > vmx->x2apic_msr_bitmap_mode = mode; > > - vmx_reset_x2apic_msrs(vcpu, mode); > + /* > + * Reset the bitmap for MSRs 0x800 - 0x83f. Leave AMD's uber-extended > + * registers (0x840 and above) intercepted, KVM doesn't support them. I don't think AMD calls them uber-extended. Just extended. >From a quick glance, these could have beeing very useful for VFIO passthrough of INT-X interrupts, removing the need to mask the interrupt on per PCI device basis - instead you can just leave the IRQ pending in ISR, while using SEOI and IER to ignore this pending bit for host. I understand that the days of INT-X are long gone (and especially days of shared IRQ lines...) and every sane device uses MSI/-X instead, but still. > + * Intercept all writes by default and poke holes as needed. Pass > + * through all reads by default in x2APIC+APICv mode, as all registers > + * except the current timer count are passed through for read. > + */ > + if (mode & MSR_BITMAP_MODE_X2APIC_APICV) > + msr_bitmap[read_idx] = 0; > + else > + msr_bitmap[read_idx] = ~0ull; > + msr_bitmap[write_idx] = ~0ull; > > /* > * TPR reads and writes can be virtualized even if virtual interrupt Other than the note about the comment, Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Best regards, Maxim Levitsky