On Wed, Oct 09, 2024 at 02:48:28PM +0800, Like Xu wrote: >On 12/12/23 2:55 AM, Jim Mattson wrote: >> I reported recently that commit 26844fee6ade ("KVM: x86: never write to >> memory from kvm_vcpu_check_block()") broke delivery of a virtualized posted >> interrupt from an L1 vCPU to a halted L2 vCPU (see >> https://lore.kernel.org/all/20231207010302.2240506-1-jmattson@xxxxxxxxxx/). >> The test that exposed the regression is the final patch of this series. The >> others are prerequisites. >> >> It would make sense to add "vmx_posted_interrupts_test" to the set of tests >> to be run under the unit test name, "vmx_apicv_test," but that is >> non-trivial. The vmx_posted_interrupts_test requires "smp = 2," but I find >> that adding that to the vmx_apicv_tests causes virt_x2apic_mode_test to >> fail with: >> >> FAIL: x2apic - reading 0x310: x86/vmx_tests.c:2151: Assertion failed: (expected) == (actual) >> LHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010 - 18 >> RHS: 0x0000000000000001 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001 - 1 >> Expected VMX_VMCALL, got VMX_EXTINT. >> STACK: 406ef8 40725a 41299f 402036 403f59 4001bd >> >> I haven't investigated. > >This vmx_apicv_test test still fails when 'ept=N' (SPR + v6.12-rc2): > >--- Virtualize APIC accesses + Use TPR shadow test --- >FAIL: xapic - reading 0x080: read 0x0, expected 0x70. >FAIL: xapic - writing 0x12345678 to 0x080: exitless write; val is 0x0, want >0x70 > >--- APIC-register virtualization test --- >FAIL: xapic - reading 0x020: read 0x0, expected 0x12345678. >FAIL: xapic - writing 0x12345678 to 0x020: x86/vmx_tests.c:2164: Assertion >failed: (expected) == (actual) > LHS: 0x0000000000000038 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0011'1000 >- 56 > RHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010 >- 18 >Expected VMX_APIC_WRITE, got VMX_VMCALL. > STACK: 406f7f 40d178 40202f 403f54 4001bd These failures occur because KVM flushes TLB with the wrong VPID, causing TLB for the 'APIC-access page' to be retained across nested transitions. This TLB entry exists because L1 writes to that page before entering the guest, see test_xapic_rd(): /* Setup virtual APIC page */ if (!expectation->virtualize_apic_accesses) { apic_access_address[apic_reg_index(reg)] = val; virtual_apic_page[apic_reg_index(reg)] = 0; } else if (exit_reason_want == VMX_VMCALL) { apic_access_address[apic_reg_index(reg)] = 0; virtual_apic_page[apic_reg_index(reg)] = val; } Specifically, in the failing scenario, EPT is disabled, and VPID is enabled in L0 but disabled in L1. As a result, vmcs01 and vmcs02 share the same VPID (vmx->vpid, see prepare_vmcs02_early_rare()), and vmx->nested.vpid02 is never used. But during nested transitions, KVM incorrectly flushes TLB using vmx->nested.vpid02. The sequence is as follows: nested_vmx_transition_tlb_flush -> kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu) -> kvm_vcpu_flush_tlb_guest -> vmx_flush_tlb_guest -> vmx_get_current_vpid -> With the diff below applied, these failures disappear. diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h index cce4e2aa30fb..246d9c6e20d0 100644 --- a/arch/x86/kvm/vmx/nested.h +++ b/arch/x86/kvm/vmx/nested.h @@ -61,13 +61,6 @@ static inline int vmx_has_valid_vmcs12(struct kvm_vcpu *vcpu) nested_vmx_is_evmptr12_set(vmx); } -static inline u16 nested_get_vpid02(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - - return vmx->nested.vpid02 ? vmx->nested.vpid02 : vmx->vpid; -} - static inline unsigned long nested_ept_get_eptp(struct kvm_vcpu *vcpu) { /* return the page table to be shadowed - in our case, EPT12 */ @@ -187,6 +180,16 @@ static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12) return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID); } +static inline u16 nested_get_vpid02(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (nested_cpu_has_vpid(get_vmcs12(vcpu)) && vmx->nested.vpid02) + return vmx->nested.vpid02; + + return vmx->vpid; +} + static inline bool nested_cpu_has_apic_reg_virt(struct vmcs12 *vmcs12) { return nested_cpu_has2(vmcs12, SECONDARY_EXEC_APIC_REGISTER_VIRT);