Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

Gleb Natapov <gleb@xxxxxxxxxx> · Wed, 25 Sep 2013 15:21:45 +0300



On Wed, Sep 25, 2013 at 02:08:09PM +0200, Paolo Bonzini wrote:
> Il 25/09/2013 13:51, Gleb Natapov ha scritto:
> > On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote:
> >> Il 25/09/2013 11:51, Gleb Natapov ha scritto:
> >>> @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> >>>  	kvm_set_cr3(vcpu, vmcs12->guest_cr3);
> >>>  	kvm_mmu_reset_context(vcpu);
> >>>  
> >>> +	if (!enable_ept)
> >>> +		vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested;
> >>> +
> >>>  	/*
> >>>  	 * L1 may access the L2's PDPTR, so save them to construct vmcs12
> >>>  	 */
> >>> @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
> >>>  	kvm_set_cr3(vcpu, vmcs12->host_cr3);
> >>>  	kvm_mmu_reset_context(vcpu);
> >>>  
> >>> +	if (!enable_ept)
> >>> +		vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
> >>
> >> This is strictly speaking not needed, because kvm_mmu_reset_context
> >> takes care of it.
> >>
> > Yeah, but better make it explicit, it does not hurt but make it more
> > clear what is going on. Or at least add comment above
> > kvm_mmu_reset_context() about this side effect.
> 
> Yes, I agree the code is cleaner like you wrote it.
> 
> >> But I wonder if it is cleaner to not touch the struct here, and instead
> >> add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like
> >> kvm_x86_ops->set_cr3.  The new member can do something like
> >>
> >> 	if (is_guest_mode(vcpu)) {
> >> 		struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> >> 		if (vmcs12->exception_bitmap & (1u << PF_VECTOR)) {
> >> 			nested_vmx_vmexit(vcpu);
> >> 			return;
> >> 		}
> >> 	}
> >>
> >> 	kvm_inject_page_fault(vcpu, fault);
> > 
> > I do not quite understand what you mean here. inject_page_fault() is
> > called from the depth of page table walking. How the code will not to
> > call new member in some circumstances?
> 
> IIUC the new function is called if and only if is_guest_mode(vcpu) && 
> !enable_ept.  So what I'm suggesting is something like this:
> 
Ah I see, so you propose to check for guest mode and enable_ept in the
function instead of switching to another function, but switching to
another function is how code was designed to be. Nested NPT/EPT provide
their own function too, but there is nothing that stops you from
checking on what MMU you are now in the function itself.

> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -735,6 +735,8 @@ struct kvm_x86_ops {
>  	void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool host);
>  
>  	void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
> +	void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu,
> +					  struct x86_exception *fault);
>  
>  	void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
>  
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
>  	vcpu->arch.walk_mmu->set_cr3           = kvm_x86_ops->set_cr3;
>  	vcpu->arch.walk_mmu->get_cr3           = get_cr3;
>  	vcpu->arch.walk_mmu->get_pdptr         = kvm_pdptr_read;
> -	vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
> +	vcpu->arch.walk_mmu->inject_page_fault = kvm_x86_ops->inject_softmmu_page_fault;
>  
>  	return r;
>  }
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
>  	vmcs12->guest_physical_address = fault->address;
>  }
>  
> +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu,
> +		struct x86_exception *fault)
> +{
> +	if (is_guest_mode(vcpu)) {
is_guest_mode(vcpu) && !enable_ept

> +		struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +		if (vmcs12->exception_bitmap & (1u << PF_VECTOR)) {
> +			nested_vmx_vmexit(vcpu);
> +			return;
> + 		}
> +	}
> +
> +	kvm_inject_page_fault(vcpu, fault);
> +}
> +
>  /* Callbacks for nested_ept_init_mmu_context: */
>  
>  static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
> @@ -8490,6 +8504,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.read_l1_tsc = vmx_read_l1_tsc,
>  
>  	.set_tdp_cr3 = vmx_set_cr3,
> +	.inject_nested_tdp_pagefault = vmx_set_cr3,
>  
>  	.check_intercept = vmx_check_intercept,
>  	.handle_external_intr = vmx_handle_external_intr,
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4347,6 +4347,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>  	.read_l1_tsc = svm_read_l1_tsc,
>  
>  	.set_tdp_cr3 = set_tdp_cr3,
> +	.inject_nested_tdp_pagefault = kvm_inject_page_fault, /*FIXME*/
>  
>  	.check_intercept = svm_check_intercept,
>  	.handle_external_intr = svm_handle_external_intr,
> 
> >> Alex (or Gleb :)), do you have any idea why SVM does not need this?
> >
> > It's probably needed there too. At least I fail to see why it does
> > not. Without that patch guest is actually booting (most of the times),
> > but sometimes random processes crash with double fault exception.
> 
> Sounds indeed like the same bug.
> 
I described what I saw with VMX, I am not saying the same happens with
SVM :) I just do not see why it should not and the non fatality of the
BUG can explain why it was missed.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html