On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote: > On 2013-07-02 17:15, Gleb Natapov wrote: > > On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote: > >> On 2013-07-02 15:59, Gleb Natapov wrote: > >>> On Tue, Jul 02, 2013 at 03:01:24AM +0000, Zhang, Yang Z wrote: > >>>> Since this series is pending in mail list for long time. And it's really a big feature for Nested. Also, I doubt the original authors(Jun and Nahav)should not have enough time to continue it. So I will pick it up. :) > >>>> > >>>> See comments below: > >>>> > >>>> Paolo Bonzini wrote on 2013-05-20: > >>>>> Il 19/05/2013 06:52, Jun Nakajima ha scritto: > >>>>>> From: Nadav Har'El <nyh@xxxxxxxxxx> > >>>>>> > >>>>>> Recent KVM, since > >>>>> http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577 > >>>>>> switch the EFER MSR when EPT is used and the host and guest have different > >>>>>> NX bits. So if we add support for nested EPT (L1 guest using EPT to run L2) > >>>>>> and want to be able to run recent KVM as L1, we need to allow L1 to use this > >>>>>> EFER switching feature. > >>>>>> > >>>>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if > >>>>> available, > >>>>>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch adds > >>>>>> support for the former (the latter is still unsupported). > >>>>>> > >>>>>> Nested entry and exit emulation (prepare_vmcs_02 and > >>>>> load_vmcs12_host_state, > >>>>>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So > >>>>> all > >>>>>> that's left to do in this patch is to properly advertise this feature to L1. > >>>>>> > >>>>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by > >>>>> using > >>>>>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we always > >>>>>> support this feature, regardless of whether the host supports it. > >>>>>> > >>>>>> Signed-off-by: Nadav Har'El <nyh@xxxxxxxxxx> > >>>>>> Signed-off-by: Jun Nakajima <jun.nakajima@xxxxxxxxx> > >>>>>> Signed-off-by: Xinhao Xu <xinhao.xu@xxxxxxxxx> > >>>>>> --- > >>>>>> arch/x86/kvm/vmx.c | 23 ++++++++++++++++------- > >>>>>> 1 file changed, 16 insertions(+), 7 deletions(-) > >>>>>> > >>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>>>> index 260a919..fb9cae5 100644 > >>>>>> --- a/arch/x86/kvm/vmx.c > >>>>>> +++ b/arch/x86/kvm/vmx.c > >>>>>> @@ -2192,7 +2192,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) > >>>>>> #else > >>>>>> nested_vmx_exit_ctls_high = 0; > >>>>>> #endif > >>>>>> - nested_vmx_exit_ctls_high |= > >>>>> VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR; > >>>>>> + nested_vmx_exit_ctls_high |= > >>>>> (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | > >>>>>> + VM_EXIT_LOAD_IA32_EFER); > >>>>>> > >>>>>> /* entry controls */ > >>>>>> rdmsr(MSR_IA32_VMX_ENTRY_CTLS, > >>>>>> @@ -2201,8 +2202,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) > >>>>>> nested_vmx_entry_ctls_low = > >>>>> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; > >>>>>> nested_vmx_entry_ctls_high &= > >>>>>> VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE; > >>>>>> - nested_vmx_entry_ctls_high |= > >>>>> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; > >>>>>> - > >>>>>> + nested_vmx_entry_ctls_high |= > >>>>> (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | > >>>>>> + VM_ENTRY_LOAD_IA32_EFER); > >>>>>> /* cpu-based controls */ > >>>>>> rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, > >>>>>> nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high); > >>>>>> @@ -7492,10 +7493,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > >>>>> struct vmcs12 *vmcs12) > >>>>>> vcpu->arch.cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask; > >>>>>> vmcs_writel(CR0_GUEST_HOST_MASK, > >>>>> ~vcpu->arch.cr0_guest_owned_bits); > >>>>>> > >>>>>> - /* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer > >>>>> below */ > >>>>>> - vmcs_write32(VM_EXIT_CONTROLS, > >>>>>> - vmcs12->vm_exit_controls | vmcs_config.vmexit_ctrl); > >>>>>> - vmcs_write32(VM_ENTRY_CONTROLS, vmcs12->vm_entry_controls | > >>>>>> + /* L2->L1 exit controls are emulated - the hardware exit is to L0 so > >>>>>> + * we should use its exit controls. Note that IA32_MODE, LOAD_IA32_EFER > >>>>>> + * bits are further modified by vmx_set_efer() below. > >>>>>> + */ > >>>>>> + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); > >>>> This is wrong. We cannot use L0 exit control directly. > >>>> LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT, ACK_INTR_ON_EXIT should use host's exit control. But others, still need use (vmcs12|host). > >>>> > >>> I do not see why. We always intercept DR7/PAT/EFER, so save is emulated > >>> too. Host address space size always come from L0 and preemption timer is > >>> not supported for nested IIRC and when it will be host will have to save > >>> it on exit anyway for correct emulation. > >> > >> Preemption timer is already supported and works fine as far as I tested. > >> KVM doesn't use it for L1, so we do not need to save/restore it - IIRC. > >> > > So what happens if L1 configures it to value X after X/2 ticks L0 exit > > happen and L0 gets back to L2 directly. The counter will be X again > > instead of X/2. > > Likely. Yes, we need to improve our emulation by setting "Save > VMX-preemption timer value" or emulate this in software if the hardware > lacks support for it (was this flag introduced after the preemption > timer itself?). > Not sure, but my point was that for correct emulation host needs to set "save preempt timer on vmexit" anyway so all VM_EXIT_CONTROLS are indeed emulated as far as I see. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html