2018-07-10 11:27+0200, KarimAllah Ahmed: > From: Jim Mattson <jmattson@xxxxxxxxxx> > > For nested virtualization L0 KVM is managing a bit of state for L2 guests, > this state can not be captured through the currently available IOCTLs. In > fact the state captured through all of these IOCTLs is usually a mix of L1 > and L2 state. It is also dependent on whether the L2 guest was running at > the moment when the process was interrupted to save its state. > > With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE > and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM > that is in VMX operation. > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: H. Peter Anvin <hpa@xxxxxxxxx> > Cc: x86@xxxxxxxxxx > Cc: kvm@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx> > [karahmed@ - rename structs and functions and make them ready for AMD and > address previous comments. > - handle nested.smm state. > - rebase & a bit of refactoring. > - Merge 7/8 and 8/8 into one patch. ] > Signed-off-by: KarimAllah Ahmed <karahmed@xxxxxxxxx> > --- > v4 -> v5: > - Drop the update to KVM_REQUEST_ARCH_BASE in favor of a patch to switch to > u64 instead. > - Fix commit message. > - Handle nested.smm state as well. > - rebase > > v3 -> v4: > - Rename function to have _nested > > v2 -> v3: > - Remove the forced VMExit from L2 after reading the kvm_state. The actual > problem is solved. > - Rebase again! > - Set nested_run_pending during restore (not sure if it makes sense yet or > not). > - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is > to switch everything to u64) > > v1 -> v2: > - Rename structs and functions and make them ready for AMD and address > previous comments. > - Rebase & a bit of refactoring. > - Merge 7/8 and 8/8 into one patch. > - Force a VMExit from L2 after reading the kvm_state to avoid mixed state > between L1 and L2 on resurrecting the instance. > --- > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > @@ -12976,6 +12977,197 @@ static int enable_smi_window(struct kvm_vcpu *vcpu) > +static int set_vmcs_cache(struct kvm_vcpu *vcpu, > + struct kvm_nested_state __user *user_kvm_nested_state, > + struct kvm_nested_state *kvm_state) > + > +{ > [...] > + > + if (kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING) > + vmx->nested.nested_run_pending = 1; > + > + if (check_vmentry_prereqs(vcpu, vmcs12) || > + check_vmentry_postreqs(vcpu, vmcs12, &exit_qual)) > + return -EINVAL; > + > + ret = enter_vmx_non_root_mode(vcpu); > + if (ret) > + return ret; > + > + /* > + * The MMU is not initialized to point at the right entities yet and > + * "get pages" would need to read data from the guest (i.e. we will > + * need to perform gpa to hpa translation). So, This request will > + * result in a call to nested_get_vmcs12_pages before the next > + * VM-entry. > + */ > + kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu); > + > + vmx->nested.nested_run_pending = 1; This is not necessary. We're only copying state and do not add anything that would be lost on a nested VM exit without prior VM entry. > + Halting the VCPU should probably be done here, just like at the end of nested_vmx_run(). > + return 0; > +} > + > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > @@ -963,6 +963,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_GET_MSR_FEATURES 153 > #define KVM_CAP_HYPERV_EVENTFD 154 > #define KVM_CAP_HYPERV_TLBFLUSH 155 > +#define KVM_CAP_STATE 156 KVM_CAP_NESTED_STATE (good documentation makes code worse. :])