> From: Tian, Kevin <kevin.tian@xxxxxxxxx> > Sent: Thursday, May 27, 2021 1:28 PM > 5.6. I/O page fault > +++++++++++++++ > > (uAPI is TBD. Here is just about the high-level flow from host IOMMU driver > to guest IOMMU driver and backwards). > > - Host IOMMU driver receives a page request with raw fault_data {rid, > pasid, addr}; > > - Host IOMMU driver identifies the faulting I/O page table according to > information registered by IOASID fault handler; > > - IOASID fault handler is called with raw fault_data (rid, pasid, addr), which > is saved in ioasid_data->fault_data (used for response); > > - IOASID fault handler generates an user fault_data (ioasid, addr), links it > to the shared ring buffer and triggers eventfd to userspace; > > - Upon received event, Qemu needs to find the virtual routing information > (v_rid + v_pasid) of the device attached to the faulting ioasid. If there are > multiple, pick a random one. This should be fine since the purpose is to > fix the I/O page table on the guest; > > - Qemu generates a virtual I/O page fault through vIOMMU into guest, > carrying the virtual fault data (v_rid, v_pasid, addr); > Why does it have to be through vIOMMU? For a VFIO PCI device, have you considered to reuse the same PRI interface to inject page fault in the guest? This eliminates any new v_rid. It will also route the page fault request and response through the right vfio device. > - Guest IOMMU driver fixes up the fault, updates the I/O page table, and > then sends a page response with virtual completion data (v_rid, v_pasid, > response_code) to vIOMMU; > What about fixing up the fault for mmu page table as well in guest? Or you meant both when above you said "updates the I/O page table"? It is unclear to me that if there is single nested page table maintained or two (one for cr3 references and other for iommu). Can you please clarify? > - Qemu finds the pending fault event, converts virtual completion data > into (ioasid, response_code), and then calls a /dev/ioasid ioctl to > complete the pending fault; > For VFIO PCI device a virtual PRI request response interface is done, it can be generic interface among multiple vIOMMUs. > - /dev/ioasid finds out the pending fault data {rid, pasid, addr} saved in > ioasid_data->fault_data, and then calls iommu api to complete it with > {rid, pasid, response_code}; >