Re: [PATCH] KVM: X86: set EXITING_GUEST_MODE as soon as vCPU exits

Jon Kohler <jon@xxxxxxxxxxx> · Wed, 30 Nov 2022 14:07:57 +0000

> On Nov 30, 2022, at 1:29 AM, Chao Gao <chao.gao@xxxxxxxxx> wrote:
> 

Chao while I’ve got you here, I was inspired to tune up the software side here based
on the VTD suppress notifications change we had been talking about. Any chance
we could get the v4 of that? Seemed like it was almost done, yea? Would love to 
get our hands on that to help accelerate the VTD path.

> On Tue, Nov 29, 2022 at 01:22:25PM -0500, Jon Kohler wrote:
>> @@ -7031,6 +7042,18 @@ void noinstr vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
>> void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
>> 					unsigned int flags)
>> {
>> +	struct kvm_vcpu *vcpu = &vmx->vcpu;
>> +
>> +	/* Optimize IPI reduction by setting mode immediately after vmexit
>> +	 * without a memmory barrier as this as not paired anywhere. vcpu->mode
>> +	 * is will be set to OUTSIDE_GUEST_MODE in x86 common code with a memory
>> +	 * barrier, after the host is done fully restoring various host states.
>> +	 * Since the rdmsr and wrmsr below are expensive, this must be done
>> +	 * first, so that the IPI suppression window covers the time dealing
>> +	 * with fixing up SPEC_CTRL.
>> +	 */
>> +	vcpu->mode = EXITING_GUEST_MODE;
> 
> Does this break kvm_vcpu_kick()? IIUC, kvm_vcpu_kick() does nothing if
> vcpu->mode is already EXITING_GUEST_MODE, expecting the vCPU will exit
> guest mode. But ...

IIRC that’d only be a problem for fast path exits that reenter guest (like TSC Deadline)
everything else *will* eventually exit out to kernel mode to pickup whatever other
requests may be pending. In this sense, this patch is actually even better for kick
because we will send incrementally less spurious kicks.

Even then, for fast path reentry exits, a guest is likely to exit all the way out eventually
for something else soon enough, so worst case something gets a wee bit more delayed
than it should. Small price to pay for clawing back cycles on the IPI send side I think.

> 
>> +
>> 	u64 hostval = this_cpu_read(x86_spec_ctrl_current);
>> 
>> 	if (!cpu_feature_enabled(X86_FEATURE_MSR_SPEC_CTRL))
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2835bd796639..0e0d228f3fa5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2160,6 +2160,14 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
>> 		data = kvm_read_edx_eax(vcpu);
>> 		if (!handle_fastpath_set_tscdeadline(vcpu, data)) {
>> 			kvm_skip_emulated_instruction(vcpu);
>> +			/* Reset IN_GUEST_MODE since we're going to reenter
>> +			 * guest as part of this fast path. This is done as
>> +			 * an optimization without a memory barrier since
>> +			 * EXITING_GUEST_MODE is also set without a memory
>> +			 * barrier.
>> +			 */
>> +			if (vcpu->mode == EXITING_GUEST_MODE)
>> +				vcpu->mode = IN_GUEST_MODE;
> 
> ... the vCPU enters guest mode again.
> 
> I believe mode transition from EXITING_GUEST_MODE to IN_GUEST_MODE
> directly isn't valid for current KVM.

You’re correct that we do not have this pattern in KVM today; however, I couldn’t
Find anything in the vcpu-requests documentation that specifically says this isn’t
Allowed. Furthermore, this really is the spirit of what we’re doing, e.g.:

We are indeed existing guest mode, but because there is a fast path, we aren’t
exiting anymore, so we need to flip the state back to in guest mode.

This code just makes that sentiment clear in code, and just so happens to also
further suppress IPI sends during that small window.