On Mon, Dec 05, 2022, Jon Kohler wrote: > > > On Dec 1, 2022, at 2:17 PM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > >> Changing vcpu->mode away from IN_GUEST_MODE as early as possible > > > > Except this isn't as early as possible. If we're going to bother doing something > > like this, my vote is to move it into assembly. > > In vmenter.S, tacking on to call vmx_spec_ctrl_restore_host seemed like the > most logical place after handling all of the state saves and RSB work. Are you > saying put it even closer to the exit, meaning before the FILL_RETURN_BUFFER? Yes, assuming that's safe, in which case it's truly the "as early as possible" location. > >> gives IPI senders as much runway as possible to avoid ringing > >> doorbell or sending posted interrupt IPI in AMD and Intel, > >> respectively. Since this is done without an explicit memory > >> barrier, the worst case is that the IPI sender sees IN_GUEST_MODE > >> still and sends a spurious event, which is the behavior prior > >> to this patch. > > > > No, worst case scenario is that kvm_vcpu_exiting_guest_mode() sees EXITING_GUEST_MODE > > and doesn't kick the vCPU. For "kicks" that set a request, kvm_vcpu_exit_request() > > will punt the vCPU out of the tight run loop, though there might be ordering issues. > > > > But whether or not there are ordering issues is a moot point since there are uses > > of kvm_vcpu_kick() that aren't accompanied by a request, e.g. to purge the PML > > buffers. In other words, kvm_vcpu_kick() absolutely cannot have false negatives. > > We could modify KVM to require a request when using kvm_vcpu_kick(), but that's > > a bit of a hack, and all of the possible ordering problems is still a pile of > > complexity I'd rather avoid. > > > > No small part of me thinks we'd be better off adding a dedicated flag to very > > precisely track whether or not the vCPU is truly "in the guest" for the purposes > > of sending IPIs. Things like kicks have different requirements around IN_GUEST_MODE > > than sending interrupts, e.g. KVM manually processes the IRR on every VM-Enter and > > so lack of an IPI is a non-issue, whereas missing an IPI for a kick is problematic. > > In other words, EXITING_GUEST_MODE really needs to mean "existing the run loop". > > Do you mean: > “one small part” (as in give this a shot, maybe), > or > “no small part” (as in good-god-don’t-do-this!) Neither. "No small part" as in "Most of my brain", i.e. "I haven't completely thought things through, but I think we'd be better off adding a dedicated flag". > I’m assuming you meant one small part :) sure, how about something like: > > To my earlier comment about when to do this within a few instructions, I don’t want > to clobber other stuff happening as part of the enter/exit, what if we repurposed/renamed > vmx_update_host_rsp and vmx_spec_ctrl_restore_host to make them “do stuff before > entry” and “do stuff right after entry returns” functions. That way we wouldn’t have to > add another other function calls or change the existing control flow all that much. I'd prefer not to wrap vmx_update_host_rsp(), that thing is a very special snowflake. I don't see why we'd have to add function calls or change the existing control flow anyways. The asm flows for VMX and SVM both take the vCPU in the form of @vmx and @svm, so accessing the proposed excution mode field is just a matter of adding an entry in arch/x86/kvm/kvm-asm-offsets.c. And now that kvm-asm-offsets.c exists, I think it makes sense to drop the @regs parameter for __vmx_vcpu_run(), e.g. to roughly match __svm_vcpu_run(). With that done as prep, accessing the vCPU immediately before/after VM-Enter and VM-Exit is easy. As a rough, incomplete sketch for VMX: diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 766c6b3ef5ed..f80553e34f26 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -102,7 +102,7 @@ SYM_FUNC_START(__vmx_vcpu_run) * an LFENCE to stop speculation from skipping the wrmsr. */ - /* Load @regs to RAX. */ + /* Load @vmx to RAX. */ mov (%_ASM_SP), %_ASM_AX /* Check if vmlaunch or vmresume is needed */ @@ -125,7 +125,11 @@ SYM_FUNC_START(__vmx_vcpu_run) mov VCPU_R14(%_ASM_AX), %r14 mov VCPU_R15(%_ASM_AX), %r15 #endif - /* Load guest RAX. This kills the @regs pointer! */ + + /* Mark the vCPU as executing in the guest! */ + movb $IN_GUEST_MODE, VCPU_EXECUTION_MODE(_%ASM_AX) + + /* Load guest RAX. This kills the @vmx pointer! */ mov VCPU_RAX(%_ASM_AX), %_ASM_AX /* Check EFLAGS.ZF from 'testb' above */ @@ -163,9 +167,11 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL) /* Temporarily save guest's RAX. */ push %_ASM_AX - /* Reload @regs to RAX. */ + /* Reload @vmx to RAX. */ mov WORD_SIZE(%_ASM_SP), %_ASM_AX + movb $IN_HOST_MODE, VCPU_EXECUTION_MODE(_%ASM_AX) + /* Save all guest registers, including RAX from the stack */ pop VCPU_RAX(%_ASM_AX) mov %_ASM_CX, VCPU_RCX(%_ASM_AX) @@ -189,9 +195,12 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL) xor %ebx, %ebx .Lclear_regs: - /* Discard @regs. The register is irrelevant, it just can't be RBX. */ + /* Pop @vmx. The register can be anything except RBX. */ pop %_ASM_AX + /* Set the execution mode (again) in case of VM-Fail. */ + movb $IN_HOST_MODE, VCPU_EXECUTION_MODE(_%ASM_AX) + /* * Clear all general purpose registers except RSP and RBX to prevent * speculative use of the guest's values, even those that are reloaded > In “do before” we could set something like vcpu->non_root_mode = true > In “do after” we could set vcpu->non_root_mode = false > > Or perhaps something like (respectively) > vcpu->operational_state = NON_ROOT_MODE > vcpu->operational_state = ROOT_MODE > > Using the root/non-root moniker would precisely track when the whether the > vCPU is in guest, and aligns with the language used to describe such a state > from the SDM. > > Thoughts? No, we should use KVM-defined names, not VMX-defined names, because we'll want the same logic for SVM. If we first rename GUEST_MODE => RUN_LOOP, then we can use IN_GUEST_MODE and IN_HOST_MODE or whatever.