Re: [PATCH v5 2/2] kvm: nVMX: Introduce KVM_CAP_NESTED_STATE

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Wed, 18 Jul 2018 19:55:46 +0200

2018-07-10 11:27+0200, KarimAllah Ahmed:
> From: Jim Mattson <jmattson@xxxxxxxxxx>
> 
> For nested virtualization L0 KVM is managing a bit of state for L2 guests,
> this state can not be captured through the currently available IOCTLs. In
> fact the state captured through all of these IOCTLs is usually a mix of L1
> and L2 state. It is also dependent on whether the L2 guest was running at
> the moment when the process was interrupted to save its state.
> 
> With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
> and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
> that is in VMX operation.
> 
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: kvm@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx>
> [karahmed@ - rename structs and functions and make them ready for AMD and
>              address previous comments.
>            - handle nested.smm state.
>            - rebase & a bit of refactoring.
>            - Merge 7/8 and 8/8 into one patch. ]
> Signed-off-by: KarimAllah Ahmed <karahmed@xxxxxxxxx>
> ---
> v4 -> v5:
> - Drop the update to KVM_REQUEST_ARCH_BASE in favor of a patch to switch to
>   u64 instead.
> - Fix commit message.
> - Handle nested.smm state as well.
> - rebase
> 
> v3 -> v4:
> - Rename function to have _nested
> 
> v2 -> v3:
> - Remove the forced VMExit from L2 after reading the kvm_state. The actual
>   problem is solved.
> - Rebase again!
> - Set nested_run_pending during restore (not sure if it makes sense yet or
>   not).
> - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is
>   to switch everything to u64)
> 
> v1 -> v2:
> - Rename structs and functions and make them ready for AMD and address
>   previous comments.
> - Rebase & a bit of refactoring.
> - Merge 7/8 and 8/8 into one patch.
> - Force a VMExit from L2 after reading the kvm_state to avoid mixed state
>   between L1 and L2 on resurrecting the instance.
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -12976,6 +12977,197 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
> +static int set_vmcs_cache(struct kvm_vcpu *vcpu,
> +			  struct kvm_nested_state __user *user_kvm_nested_state,
> +			  struct kvm_nested_state *kvm_state)
> +
> +{
> [...]
> +
> +	if (kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING)
> +		vmx->nested.nested_run_pending = 1;
> +
> +	if (check_vmentry_prereqs(vcpu, vmcs12) ||
> +	    check_vmentry_postreqs(vcpu, vmcs12, &exit_qual))
> +		return -EINVAL;
> +
> +	ret = enter_vmx_non_root_mode(vcpu);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * The MMU is not initialized to point at the right entities yet and
> +	 * "get pages" would need to read data from the guest (i.e. we will
> +	 * need to perform gpa to hpa translation). So, This request will
> +	 * result in a call to nested_get_vmcs12_pages before the next
> +	 * VM-entry.
> +	 */
> +	kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
> +
> +	vmx->nested.nested_run_pending = 1;

This is not necessary.  We're only copying state and do not add anything
that would be lost on a nested VM exit without prior VM entry.

> +

Halting the VCPU should probably be done here, just like at the end of
nested_vmx_run().

> +	return 0;
> +}
> +
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> @@ -963,6 +963,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_GET_MSR_FEATURES 153
>  #define KVM_CAP_HYPERV_EVENTFD 154
>  #define KVM_CAP_HYPERV_TLBFLUSH 155
> +#define KVM_CAP_STATE 156

KVM_CAP_NESTED_STATE

(good documentation makes code worse. :])