Re: [kvm-unit-tests PATCH 0/5] nVMX: Simple posted interrupts test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 14, 2024, Chao Gao wrote:
> On Wed, Oct 09, 2024 at 02:48:28PM +0800, Like Xu wrote:
> >On 12/12/23 2:55 AM, Jim Mattson wrote:
> >> I reported recently that commit 26844fee6ade ("KVM: x86: never write to
> >> memory from kvm_vcpu_check_block()") broke delivery of a virtualized posted
> >> interrupt from an L1 vCPU to a halted L2 vCPU (see
> >> https://lore.kernel.org/all/20231207010302.2240506-1-jmattson@xxxxxxxxxx/).
> >> The test that exposed the regression is the final patch of this series. The
> >> others are prerequisites.
> >> 
> >> It would make sense to add "vmx_posted_interrupts_test" to the set of tests
> >> to be run under the unit test name, "vmx_apicv_test," but that is
> >> non-trivial. The vmx_posted_interrupts_test requires "smp = 2," but I find
> >> that adding that to the vmx_apicv_tests causes virt_x2apic_mode_test to
> >> fail with:
> >> 
> >> FAIL: x2apic - reading 0x310: x86/vmx_tests.c:2151: Assertion failed: (expected) == (actual)
> >> 	LHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010 - 18
> >> 	RHS: 0x0000000000000001 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001 - 1
> >> Expected VMX_VMCALL, got VMX_EXTINT.
> >> 	STACK: 406ef8 40725a 41299f 402036 403f59 4001bd
> >> 
> >> I haven't investigated.
> >
> >This vmx_apicv_test test still fails when 'ept=N' (SPR + v6.12-rc2):
> >
> >--- Virtualize APIC accesses + Use TPR shadow test ---
> >FAIL: xapic - reading 0x080: read 0x0, expected 0x70.
> >FAIL: xapic - writing 0x12345678 to 0x080: exitless write; val is 0x0, want
> >0x70
> >
> >--- APIC-register virtualization test ---
> >FAIL: xapic - reading 0x020: read 0x0, expected 0x12345678.
> >FAIL: xapic - writing 0x12345678 to 0x020: x86/vmx_tests.c:2164: Assertion
> >failed: (expected) == (actual)
> >	LHS: 0x0000000000000038 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0011'1000
> >- 56
> >	RHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010
> >- 18
> >Expected VMX_APIC_WRITE, got VMX_VMCALL.
> >	STACK: 406f7f 40d178 40202f 403f54 4001bd
> 
> These failures occur because KVM flushes TLB with the wrong VPID, causing TLB
> for the 'APIC-access page' to be retained across nested transitions. This TLB
> entry exists because L1 writes to that page before entering the guest, see
> test_xapic_rd():
> 
>         /* Setup virtual APIC page */
>         if (!expectation->virtualize_apic_accesses) {
>                 apic_access_address[apic_reg_index(reg)] = val;
>                 virtual_apic_page[apic_reg_index(reg)] = 0;
>         } else if (exit_reason_want == VMX_VMCALL) {
>                 apic_access_address[apic_reg_index(reg)] = 0;
>                 virtual_apic_page[apic_reg_index(reg)] = val;
>         }
> 
> 
> Specifically, in the failing scenario, EPT is disabled, and VPID is enabled in
> L0 but disabled in L1. As a result, vmcs01 and vmcs02 share the same VPID
> (vmx->vpid, see prepare_vmcs02_early_rare()), and vmx->nested.vpid02 is never
> used. But during nested transitions, KVM incorrectly flushes TLB using
> vmx->nested.vpid02. The sequence is as follows:
> 
> 	nested_vmx_transition_tlb_flush ->
> 	  kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu) ->
> 	    kvm_vcpu_flush_tlb_guest ->	
> 	      vmx_flush_tlb_guest ->
> 		vmx_get_current_vpid ->
> 
> With the diff below applied, these failures disappear.
> 
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index cce4e2aa30fb..246d9c6e20d0 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -61,13 +61,6 @@ static inline int vmx_has_valid_vmcs12(struct kvm_vcpu *vcpu)
>  		nested_vmx_is_evmptr12_set(vmx);
>  }
>  
> -static inline u16 nested_get_vpid02(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> -	return vmx->nested.vpid02 ? vmx->nested.vpid02 : vmx->vpid;
> -}
> -
>  static inline unsigned long nested_ept_get_eptp(struct kvm_vcpu *vcpu)
>  {
>  	/* return the page table to be shadowed - in our case, EPT12 */
> @@ -187,6 +180,16 @@ static inline bool nested_cpu_has_vpid(struct vmcs12 *vmcs12)
>  	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_VPID);
>  }
>  
> +static inline u16 nested_get_vpid02(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (nested_cpu_has_vpid(get_vmcs12(vcpu)) && vmx->nested.vpid02)

This isn't quite right.  When KVM emulates INVVPID for L1, there is no way to
know which vmcs12 will be used and thus no way to know if KVM should invalidate
vpid02 or vpid01.  So I'm fairly certain the correct fix is:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a4438358c5e..896f0fea0306 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3216,7 +3216,7 @@ void vmx_flush_tlb_all(struct kvm_vcpu *vcpu)
 
 static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu)
 {
-       if (is_guest_mode(vcpu))
+       if (is_guest_mode(vcpu) && nested_cpu_has_vpid(get_vmcs12(vcpu)))
                return nested_get_vpid02(vcpu);
        return to_vmx(vcpu)->vpid;
 }

Note, it's tempting, but subtly wrong (though not a violation of the architecture),
to use vpid02 when VPID is disabled in vmcs12, i.e. this would also resolve the
issue, but would result in over-flushing.

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a8e7bc04d9bf..ce89e2e27681 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2316,7 +2316,7 @@ static void prepare_vmcs02_early_rare(struct vcpu_vmx *vmx,
        vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA);
 
        if (enable_vpid) {
-               if (nested_cpu_has_vpid(vmcs12) && vmx->nested.vpid02)
+               if (vmx->nested.vpid02)
                        vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->nested.vpid02);
                else
                        vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);

And it's wrong because the comment in nested_vmx_transition_tlb_flush() is wrong:

	 * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
	 * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
	 * full TLB flush from the guest's perspective

Per the SDM, only VPID=0 is flushed:

  If the “enable VPID” VM-execution control is 0, VM entries and VM exits invalidate
  linear mappings and combined mappings associated with VPID 0000H (for all PCIDs).
  Combined mappings for VPID 0000H are invalidated for all EPTRTAs.

so using vpid01, which mimics using L1's host VPID (of '0'), is correct.  As above,
the fallout is simply that KVM will over-flush, e.g. if L1 runs L2 X with VPID=1,
then L2 Y with VPID disabled, and then runs X again, it's legal to retain TLB
entries for VPID=1 even though there was a VMX transition with VPID disabled.

I'll post a proper patch with lots of comments.

Thanks Chao!





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux