Re: Problem with vmrun in an interrupt shadow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 07, 2025, Doug Covelli wrote:
> To test support for nested virtualization I was running a VM (L2) on a
> debug build of ESX (L1) on VMware Workstation/KVM (L0).  This
> consistently resulted in an ASSERT in L1 firing as the interrupt
> shadow bit in the VMCB was set on an #NPF exit that occurred when
> vectoring through the IDT to deliver an interrupt to L2.
> 
> Some details from our exit recorder are below.  Basically what
> happened is that L1 resumed L2 after handling an I/O exit and
> attempted to inject an internal interrupt with vector 0x68.  This
> resulted in a #NPF exit when vectoring through the IDT to deliver the
> interrupt to the guest with the interrupt shadow bit set which our
> code is not expecting.  There is no reason for the interrupt shadow
> bit to be set and neither L1 or L0 were setting it.
> 
> This turns out to be due to a quirk where on AMD 'vmrun' after an
> 'sti' will cause the interrupt shadow bit to leak into the guest state
> in the VMCB. Jim Mattson discovered this back when he was with VMware
> and checked in a fix to make sure that our 'vmrun' is not immediately
> after an 'sti':
> 
>         sti             /* Enable interrupts during guest execution */
>         mov             svmPhysCurrentVMCB(%rip), %rax
>         vmrun           /* Must not immediately follow STI. See PR 150935 */
> 
> PR 150935 describes exactly the same problem I am seeing with KVM.
> For KVM the 'vmrun' is immediately after a 'sti' though:
> 
>         /* Enter guest mode */
>         sti
> 
> 1:      vmrun %rax
> 
> I confirmed that moving the 'sti' after the mov instruction in the
> VMware code causes the same exact ASSERT to fire.  I discussed this
> with Jim and Sean and they suggested sending an e-mail to this list.
> Jim also mentioned that this was introduced by [1] a few years back.
> It would be hard to argue that this isn't an AMD bug but it seems best
> to workaround it in SW.  It would be great if someone could fix this
> but if folks are too busy I can ask Zach to include it in the patches
> he is working on.

I'll post a patch and a regression test.  It took me ~15 minutes to realize the
key is taking an exit while injecting an event, i.e. before executing anything
in the guest.  ~3 minutes to re-learn nested_exceptions_test.c, and 2 seconds
to add a testcase:

diff --git a/tools/testing/selftests/kvm/x86/nested_exceptions_test.c b/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
index 3eb0313ffa39..3641a42934ac 100644
--- a/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_exceptions_test.c
@@ -85,6 +85,7 @@ static void svm_run_l2(struct svm_test_data *svm, void *l2_code, int vector,
 
        GUEST_ASSERT_EQ(ctrl->exit_code, (SVM_EXIT_EXCP_BASE + vector));
        GUEST_ASSERT_EQ(ctrl->exit_info_1, error_code);
+       GUEST_ASSERT(!ctrl->int_state);
 }
 
 static void l1_svm_code(struct svm_test_data *svm)
@@ -122,6 +123,7 @@ static void vmx_run_l2(void *l2_code, int vector, uint32_t error_code)
        GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_EXCEPTION_NMI);
        GUEST_ASSERT_EQ((vmreadz(VM_EXIT_INTR_INFO) & 0xff), vector);
        GUEST_ASSERT_EQ(vmreadz(VM_EXIT_INTR_ERROR_CODE), error_code);
+       GUEST_ASSERT(!vmreadz(GUEST_INTERRUPTIBILITY_INFO));
 }
 
 static void l1_vmx_code(struct vmx_pages *vmx)





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux