On Tue, Jul 02, 2013 at 03:01:24AM +0000, Zhang, Yang Z wrote: > Since this series is pending in mail list for long time. And it's really a big feature for Nested. Also, I doubt the original authors(Jun and Nahav)should not have enough time to continue it. So I will pick it up. :) > > See comments below: > > Paolo Bonzini wrote on 2013-05-20: > > Il 19/05/2013 06:52, Jun Nakajima ha scritto: > > > From: Nadav Har'El <nyh@xxxxxxxxxx> > > > > > > Recent KVM, since > > http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 > > > switch the EFER MSR when EPT is used and the host and guest have different > > > NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) > > > and want to be able to run recent KVM as L1, we need to allow L1 to use this > > > EFER switching feature. > > > > > > To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if > > available, > > > and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds > > > support for the former (the latter is still unsupported). > > > > > > Nested entry and exit emulation (prepare_vmcs_02 and > > load_vmcs12_host_state, > > > respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So > > all > > > that's left to do in this patch is to properly advertise this feature to L1. > > > > > > Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by > > using > > > vmx_set_efer (which itself sets one of several vmcs02 fields), so we always > > > support this feature, regardless of whether the host supports it. > > > > > > Signed-off-by: Nadav Har'El <nyh@xxxxxxxxxx> > > > Signed-off-by: Jun Nakajima <jun.nakajima@xxxxxxxxx> > > > Signed-off-by: Xinhao Xu <xinhao.xu@xxxxxxxxx> > > > --- > > > arch/x86/kvm/vmx.c | 23 ++++++++++++++++------- > > > 1 file changed, 16 insertions(+), 7 deletions(-) > > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > > index 260a919..fb9cae5 100644 > > > --- a/arch/x86/kvm/vmx.c > > > +++ b/arch/x86/kvm/vmx.c > > > @@ -2192,7 +2192,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) > > > #else > > > nested_vmx_exit_ctls_high = 0; > > > #endif > > > - nested_vmx_exit_ctls_high |= > > VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; > > > + nested_vmx_exit_ctls_high |= > > (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | > > > + VM_EXIT_LOAD_IA32_EFER); > > > > > > /* entry controls */ > > > rdmsr(MSR_IA32_VMX_ENTRY_CTLS, > > > @@ -2201,8 +2202,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) > > > nested_vmx_entry_ctls_low = > > VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; > > > nested_vmx_entry_ctls_high &= > > > VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE; > > > - nested_vmx_entry_ctls_high |= > > VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; > > > - > > > + nested_vmx_entry_ctls_high |= > > (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | > > > + VM_ENTRY_LOAD_IA32_EFER); > > > /* cpu-based controls */ > > > rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, > > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > > > @@ -7492,10 +7493,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > > struct vmcs12 *vmcs12) > > > vcpu->arch.cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask; > > > vmcs_writel(CR0_GUEST_HOST_MASK, > > ~vcpu->arch.cr0_guest_owned_bits); > > > > > > - /* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer > > below */ > > > - vmcs_write32(VM_EXIT_CONTROLS, > > > - vmcs12->vm_exit_controls | vmcs_config.vmexit_ctrl); > > > - vmcs_write32(VM_ENTRY_CONTROLS, vmcs12->vm_entry_controls | > > > + /* L2->L1 exit controls are emulated - the hardware exit is to L0 so > > > + * we should use its exit controls. Note that IA32_MODE, LOAD_IA32_EFER > > > + * bits are further modified by vmx_set_efer() below. > > > + */ > > > + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); > This is wrong. We cannot use L0 exit control directly. > LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT, ACK_INTR_ON_EXIT should use host's exit control. But others, still need use (vmcs12|host). > I do not see why. We always intercept DR7/PAT/EFER, so save is emulated too. Host address space size always come from L0 and preemption timer is not supported for nested IIRC and when it will be host will have to save it on exit anyway for correct emulation. > > > + > > > + /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE > > are > > > + * emulated by vmx_set_efer(), below. > > > > VM_ENTRY_LOAD_IA32_EFER is not emulated by vmx_set_efer, so: > VM_ENTRY_LOAD_IA32_EFER is hanlded in setup_msrs(), and vmx_set_efer already call it. > > > > > /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE > > * are emulated below. VM_ENTRY_IA32E_MODE is handled in > > * vmx_set_efer(). */ > > > > Paolo > > > > > + */ > > > + vmcs_write32(VM_ENTRY_CONTROLS, > > > + (vmcs12->vm_entry_controls & ~VM_ENTRY_LOAD_IA32_EFER & > > > + ~VM_ENTRY_IA32E_MODE) | > > > (vmcs_config.vmentry_ctrl & ~VM_ENTRY_IA32E_MODE)); > > > > > > if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Best regards, > Yang > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html