On Wed, Sep 25, 2013 at 02:08:09PM +0200, Paolo Bonzini wrote: > Il 25/09/2013 13:51, Gleb Natapov ha scritto: > > On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote: > >> Il 25/09/2013 11:51, Gleb Natapov ha scritto: > >>> @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) > >>> kvm_set_cr3(vcpu, vmcs12->guest_cr3); > >>> kvm_mmu_reset_context(vcpu); > >>> > >>> + if (!enable_ept) > >>> + vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested; > >>> + > >>> /* > >>> * L1 may access the L2's PDPTR, so save them to construct vmcs12 > >>> */ > >>> @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, > >>> kvm_set_cr3(vcpu, vmcs12->host_cr3); > >>> kvm_mmu_reset_context(vcpu); > >>> > >>> + if (!enable_ept) > >>> + vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault; > >> > >> This is strictly speaking not needed, because kvm_mmu_reset_context > >> takes care of it. > >> > > Yeah, but better make it explicit, it does not hurt but make it more > > clear what is going on. Or at least add comment above > > kvm_mmu_reset_context() about this side effect. > > Yes, I agree the code is cleaner like you wrote it. > > >> But I wonder if it is cleaner to not touch the struct here, and instead > >> add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like > >> kvm_x86_ops->set_cr3. The new member can do something like > >> > >> if (is_guest_mode(vcpu)) { > >> struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > >> if (vmcs12->exception_bitmap & (1u << PF_VECTOR)) { > >> nested_vmx_vmexit(vcpu); > >> return; > >> } > >> } > >> > >> kvm_inject_page_fault(vcpu, fault); > > > > I do not quite understand what you mean here. inject_page_fault() is > > called from the depth of page table walking. How the code will not to > > call new member in some circumstances? > > IIUC the new function is called if and only if is_guest_mode(vcpu) && > !enable_ept. So what I'm suggesting is something like this: > Ah I see, so you propose to check for guest mode and enable_ept in the function instead of switching to another function, but switching to another function is how code was designed to be. Nested NPT/EPT provide their own function too, but there is nothing that stops you from checking on what MMU you are now in the function itself. > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -735,6 +735,8 @@ struct kvm_x86_ops { > void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool host); > > void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); > + void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu, > + struct x86_exception *fault); > > void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); > > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu) > vcpu->arch.walk_mmu->set_cr3 = kvm_x86_ops->set_cr3; > vcpu->arch.walk_mmu->get_cr3 = get_cr3; > vcpu->arch.walk_mmu->get_pdptr = kvm_pdptr_read; > - vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault; > + vcpu->arch.walk_mmu->inject_page_fault = kvm_x86_ops->inject_softmmu_page_fault; > > return r; > } > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, > vmcs12->guest_physical_address = fault->address; > } > > +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu, > + struct x86_exception *fault) > +{ > + if (is_guest_mode(vcpu)) { is_guest_mode(vcpu) && !enable_ept > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > + if (vmcs12->exception_bitmap & (1u << PF_VECTOR)) { > + nested_vmx_vmexit(vcpu); > + return; > + } > + } > + > + kvm_inject_page_fault(vcpu, fault); > +} > + > /* Callbacks for nested_ept_init_mmu_context: */ > > static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu) > @@ -8490,6 +8504,7 @@ static struct kvm_x86_ops vmx_x86_ops = { > .read_l1_tsc = vmx_read_l1_tsc, > > .set_tdp_cr3 = vmx_set_cr3, > + .inject_nested_tdp_pagefault = vmx_set_cr3, > > .check_intercept = vmx_check_intercept, > .handle_external_intr = vmx_handle_external_intr, > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -4347,6 +4347,7 @@ static struct kvm_x86_ops svm_x86_ops = { > .read_l1_tsc = svm_read_l1_tsc, > > .set_tdp_cr3 = set_tdp_cr3, > + .inject_nested_tdp_pagefault = kvm_inject_page_fault, /*FIXME*/ > > .check_intercept = svm_check_intercept, > .handle_external_intr = svm_handle_external_intr, > > >> Alex (or Gleb :)), do you have any idea why SVM does not need this? > > > > It's probably needed there too. At least I fail to see why it does > > not. Without that patch guest is actually booting (most of the times), > > but sometimes random processes crash with double fault exception. > > Sounds indeed like the same bug. > I described what I saw with VMX, I am not saying the same happens with SVM :) I just do not see why it should not and the non fatality of the BUG can explain why it was missed. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html