RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Fri, 2 Apr 2021 07:30:23 +0000

> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Friday, April 2, 2021 12:04 AM
> 
> On Thu, Apr 01, 2021 at 02:08:17PM +0000, Liu, Yi L wrote:
> 
> > DMA page faults are delivered to root-complex via page request message
> and
> > it is per-device according to PCIe spec. Page request handling flow is:
> >
> > 1) iommu driver receives a page request from device
> > 2) iommu driver parses the page request message. Get the RID,PASID,
> faulted
> >    page and requested permissions etc.
> > 3) iommu driver triggers fault handler registered by device driver with
> >    iommu_report_device_fault()
> 
> This seems confused.
> 
> The PASID should define how to handle the page fault, not the driver.
> 
> I don't remember any device specific actions in ATS, so what is the
> driver supposed to do?
> 
> > 4) device driver's fault handler signals an event FD to notify userspace to
> >    fetch the information about the page fault. If it's VM case, inject the
> >    page fault to VM and let guest to solve it.
> 
> If the PASID is set to 'report page fault to userspace' then some
> event should come out of /dev/ioasid, or be reported to a linked
> eventfd, or whatever.
> 
> If the PASID is set to 'SVM' then the fault should be passed to
> handle_mm_fault
> 
> And so on.
> 
> Userspace chooses what happens based on how they configure the PASID
> through /dev/ioasid.
> 
> Why would a device driver get involved here?
> 
> > Eric has sent below series for the page fault reporting for VM with passthru
> > device.
> > https://lore.kernel.org/kvm/20210223210625.604517-5-
> eric.auger@xxxxxxxxxx/
> 
> It certainly should not be in vfio pci. Everything using a PASID needs
> this infrastructure, VDPA, mdev, PCI, CXL, etc.
> 

This touches an interesting fact:

The fault may be triggered in either 1st-level or 2nd-level page table, 
when nested translation is enabled (in vSVA case). The 1st-level is bound 
by the user space, which therefore needs to receive the fault event. The 
2nd-level is managed by VFIO (or vDPA), which needs to fix the fault in 
kernel (e.g. find HVA per faulting GPA, call handle_mm_fault and map 
GPA->HPA to IOMMU). Yi's current proposal lets VFIO to register the 
device fault handler, which then forward the event through /dev/ioasid 
to userspace only if it is a 1st-level fault. Are you suggesting a pgtable-
centric fault reporting mechanism to separate handlers in each level, 
i.e. letting VFIO register handler only for 2nd-level fault and then /dev/
ioasid register handler for 1st-level fault?

Thanks
Kevin