On Fri, Apr 02, 2021 at 07:30:23AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Friday, April 2, 2021 12:04 AM > > > > On Thu, Apr 01, 2021 at 02:08:17PM +0000, Liu, Yi L wrote: > > > > > DMA page faults are delivered to root-complex via page request message > > and > > > it is per-device according to PCIe spec. Page request handling flow is: > > > > > > 1) iommu driver receives a page request from device > > > 2) iommu driver parses the page request message. Get the RID,PASID, > > faulted > > > page and requested permissions etc. > > > 3) iommu driver triggers fault handler registered by device driver with > > > iommu_report_device_fault() > > > > This seems confused. > > > > The PASID should define how to handle the page fault, not the driver. > > > > I don't remember any device specific actions in ATS, so what is the > > driver supposed to do? > > > > > 4) device driver's fault handler signals an event FD to notify userspace to > > > fetch the information about the page fault. If it's VM case, inject the > > > page fault to VM and let guest to solve it. > > > > If the PASID is set to 'report page fault to userspace' then some > > event should come out of /dev/ioasid, or be reported to a linked > > eventfd, or whatever. > > > > If the PASID is set to 'SVM' then the fault should be passed to > > handle_mm_fault > > > > And so on. > > > > Userspace chooses what happens based on how they configure the PASID > > through /dev/ioasid. > > > > Why would a device driver get involved here? > > > > > Eric has sent below series for the page fault reporting for VM with passthru > > > device. > > > https://lore.kernel.org/kvm/20210223210625.604517-5- > > eric.auger@xxxxxxxxxx/ > > > > It certainly should not be in vfio pci. Everything using a PASID needs > > this infrastructure, VDPA, mdev, PCI, CXL, etc. > > > > This touches an interesting fact: > > The fault may be triggered in either 1st-level or 2nd-level page table, > when nested translation is enabled (in vSVA case). The 1st-level is bound > by the user space, which therefore needs to receive the fault event. The > 2nd-level is managed by VFIO (or vDPA), which needs to fix the fault in > kernel (e.g. find HVA per faulting GPA, call handle_mm_fault and map > GPA->HPA to IOMMU). Yi's current proposal lets VFIO to register the > device fault handler, which then forward the event through /dev/ioasid > to userspace only if it is a 1st-level fault. Are you suggesting a pgtable- > centric fault reporting mechanism to separate handlers in each level, > i.e. letting VFIO register handler only for 2nd-level fault and then /dev/ > ioasid register handler for 1st-level fault? This I'm struggling to understand. /dev/ioasid should handle all the faults cases, why would VFIO ever get involved in a fault? What would it even do? If the fault needs to be fixed in the hypervisor then it is a kernel fault and it does handle_mm_fault. This absolutely should not be in VFIO or VDPA If the fault needs to be fixed in the guest, then it needs to be delivered over /dev/ioasid in some way and injected into the vIOMMU. VFIO and VDPA have nothing to do with vIOMMU driver in quemu. You need to have an interface under /dev/ioasid to create both page table levels and part of that will be to tell the kernel what VA is mapped and how to handle faults. VFIO/VDPA do *nothing* more than authorize the physical device to use the given PASID. In the VDPA case you might need to have SW access to the PASID, but that should be provided by a generic iommu layer interface like 'copy_to/from_pasid()' not by involving VDPA in the address mapping. Jason