On Tue, May 30, 2023 at 01:37:07PM +0800, Lu Baolu wrote: > Hi folks, > > This series implements the functionality of delivering IO page faults to > user space through the IOMMUFD framework. The use case is nested > translation, where modern IOMMU hardware supports two-stage translation > tables. The second-stage translation table is managed by the host VMM > while the first-stage translation table is owned by the user space. > Hence, any IO page fault that occurs on the first-stage page table > should be delivered to the user space and handled there. The user space > should respond the page fault handling result to the device top-down > through the IOMMUFD response uAPI. > > User space indicates its capablity of handling IO page faults by setting > a user HWPT allocation flag IOMMU_HWPT_ALLOC_FLAGS_IOPF_CAPABLE. IOMMUFD > will then setup its infrastructure for page fault delivery. Together > with the iopf-capable flag, user space should also provide an eventfd > where it will listen on any down-top page fault messages. > > On a successful return of the allocation of iopf-capable HWPT, a fault > fd will be returned. User space can open and read fault messages from it > once the eventfd is signaled. This is a performance path so we really need to think about this more, polling on an eventfd and then reading a different fd is not a good design. What I would like is to have a design from the start that fits into io_uring, so we can have pre-posted 'recvs' in io_uring that just get completed at high speed when PRIs come in. This suggests that the PRI should be delivered via read() on a single FD and pollability on the single FD without any eventfd. > Besides the overall design, I'd like to hear comments about below > designs: > > - The IOMMUFD fault message format. It is very similar to that in > uapi/linux/iommu which has been discussed before and partially used by > the IOMMU SVA implementation. I'd like to get more comments on the > format when it comes to IOMMUFD. We have to have the same discussion as always, does a generic fault message format make any sense here? PRI seems more likely that it would but it needs a big carefull cross vendor check out. Jason