Re: [PATCH] x86: mmu: report failed memory access to the userspace

Peter Xu <peterx@xxxxxxxxxx> · Thu, 19 Jul 2018 16:44:44 +0800

On Wed, Jul 18, 2018 at 09:59:34AM -0700, Sean Christopherson wrote:
> On Wed, Jul 18, 2018 at 01:27:56PM +0800, Peter Xu wrote:
> > On Tue, Jul 17, 2018 at 05:15:28PM +0300, Denis Plotnikov wrote:
> > > The patch enables userspace to be notified about failed memeory accesses.
> > > 
> > > This can be employed by userspace, say QEMU, to build the memory access
> > > tracker while userfaultfd doesn't support detection of protected memeory
> > > writes.
> > > 
> > > In case of QEMU, it would allow to make a background snapshot mode.
> > > 
> > > Signed-off-by: Denis Plotnikov <dplotnikov@xxxxxxxxxxxxx>
> > > ---
> > >  arch/x86/kvm/vmx.c       | 17 ++++++++++++++++-
> > >  include/uapi/linux/kvm.h |  6 ++++++
> > >  2 files changed, 22 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index 1689f433f3a0..9c9b840537de 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -7269,6 +7269,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> > >  	unsigned long exit_qualification;
> > >  	gpa_t gpa;
> > >  	u64 error_code;
> > > +	int r;
> > >  
> > >  	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> > >  
> > > @@ -7305,7 +7306,21 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> > >  	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
> > >  
> > >  	vcpu->arch.exit_qualification = exit_qualification;
> > > -	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> > > +
> > > +	r = kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> > > +
> > > +	if (r == -EFAULT) {
> > > +		unsigned long hva;
> > > +
> > > +		hva = kvm_vcpu_gfn_to_hva(vcpu, gpa >> PAGE_SHIFT);
> > > +
> > > +		vcpu->run->exit_reason = KVM_EXIT_FAIL_MEM_ACCESS;
> > > +		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_EPT_VIOLATION;
> > > +		vcpu->run->fail_mem_access.hva = hva | (gpa & (PAGE_SIZE-1));
> > > +		r = 0;
> > > +	}
> > > +
> > > +	return r;
> > 
> > What I feel like is that we should do this inside kvm_mmu_page_fault()
> > or even deeper inside otherwise this will only work for EPT but not
> > the other use cases as I mentioned in my reply in your QEMU series
> > (e.g., no paging, or shadow pagings).
> 
> This wouldn't even be consistent for all EPT cases, e.g. the misconfig
> path can also return -EFAULT.

Yes.

>  
> > Another thing that I am not sure is that whether we should capture all
> > the -EFAULT into FAIL_MEM_ACCESS event.  AFAIU what we really care
> > about is that when kvm vcpus try to write read-only pages setup by
> > mprotect() before hand.  So not sure we should only trap this at the
> > specific point where we found that we're writting to a read-only page.
> > IIUC that's somewhere inside hva_to_pfn() and when hva_to_pfn_slow()
> > failed into:
> > 
> >         ...
> > 	} else {
> > 		if (async && vma_is_valid(vma, write_fault))
> > 			*async = true;
> > 		pfn = KVM_PFN_ERR_FAULT;   <===== the -EFAULT should be translated from this
> > 	}
> >         ...
> > 
> > So maybe we should only trap here instead of all the -EFAULT errors?
> > And this is natually servicing all paging modes, and possibly AMD as
> > well.
> 
> What about making this a more generic reporting mechanism as opposed
> to being specific to a single use case, e.g. by exposing GPA and fault
> type (fetch, read, write, etc...) in addition to HVA?  IMO this would
> be useful even if Qemu did nothing more than print the failing GPA for
> cases where Qemu can't do anything beyond die.  And I have a potential
> use case (SGX EPC allocation failure) for reporting a non-fatal error
> to userspace in the VM_PFNMAP path (above the code snippet you pasted).

Makes sense to me.  Actually it sounds even useful to debug some KVM
problems with those extra lines dumped.

Actually this also reminded me about whether we can do this in a more
general fashion instead of MMU-only to report error details to
userspace when some KVM error occurs.  Now what we have is only an
errno, e.g., when we encountered any kind of failure during a KVM_RUN
procedure we only get a integer as fault reason.  Maybe we can provide
meaningful fault informations, then we return them to userspace when
capable (e.g., when KVM_CAP_XXX is set).  Then these kinds of MMU
error can be the first candidate of such rich error information.  Not
sure whether this would be anywhere close to useful to other KVM
components.

Regards,

-- 
Peter Xu