> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Friday, April 2, 2021 12:04 AM > > On Thu, Apr 01, 2021 at 02:08:17PM +0000, Liu, Yi L wrote: > > > DMA page faults are delivered to root-complex via page request message > and > > it is per-device according to PCIe spec. Page request handling flow is: > > > > 1) iommu driver receives a page request from device > > 2) iommu driver parses the page request message. Get the RID,PASID, > faulted > > page and requested permissions etc. > > 3) iommu driver triggers fault handler registered by device driver with > > iommu_report_device_fault() > > This seems confused. > > The PASID should define how to handle the page fault, not the driver. > > I don't remember any device specific actions in ATS, so what is the > driver supposed to do? > > > 4) device driver's fault handler signals an event FD to notify userspace to > > fetch the information about the page fault. If it's VM case, inject the > > page fault to VM and let guest to solve it. > > If the PASID is set to 'report page fault to userspace' then some > event should come out of /dev/ioasid, or be reported to a linked > eventfd, or whatever. > > If the PASID is set to 'SVM' then the fault should be passed to > handle_mm_fault > > And so on. > > Userspace chooses what happens based on how they configure the PASID > through /dev/ioasid. > > Why would a device driver get involved here? > > > Eric has sent below series for the page fault reporting for VM with passthru > > device. > > https://lore.kernel.org/kvm/20210223210625.604517-5- > eric.auger@xxxxxxxxxx/ > > It certainly should not be in vfio pci. Everything using a PASID needs > this infrastructure, VDPA, mdev, PCI, CXL, etc. > This touches an interesting fact: The fault may be triggered in either 1st-level or 2nd-level page table, when nested translation is enabled (in vSVA case). The 1st-level is bound by the user space, which therefore needs to receive the fault event. The 2nd-level is managed by VFIO (or vDPA), which needs to fix the fault in kernel (e.g. find HVA per faulting GPA, call handle_mm_fault and map GPA->HPA to IOMMU). Yi's current proposal lets VFIO to register the device fault handler, which then forward the event through /dev/ioasid to userspace only if it is a 1st-level fault. Are you suggesting a pgtable- centric fault reporting mechanism to separate handlers in each level, i.e. letting VFIO register handler only for 2nd-level fault and then /dev/ ioasid register handler for 1st-level fault? Thanks Kevin