> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Saturday, January 30, 2021 6:58 AM > > On Mon, 25 Jan 2021 17:03:58 +0800 > Shenming Lu <lushenming@xxxxxxxxxx> wrote: > > > Hi, > > > > The static pinning and mapping problem in VFIO and possible solutions > > have been discussed a lot [1, 2]. One of the solutions is to add I/O > > page fault support for VFIO devices. Different from those relatively > > complicated software approaches such as presenting a vIOMMU that > provides > > the DMA buffer information (might include para-virtualized optimizations), > > IOPF mainly depends on the hardware faulting capability, such as the PCIe > > PRI extension or Arm SMMU stall model. What's more, the IOPF support in > > the IOMMU driver is being implemented in SVA [3]. So do we consider to > > add IOPF support for VFIO passthrough based on the IOPF part of SVA at > > present? > > > > We have implemented a basic demo only for one stage of translation (GPA > > -> HPA in virtualization, note that it can be configured at either stage), > > and tested on Hisilicon Kunpeng920 board. The nested mode is more > complicated > > since VFIO only handles the second stage page faults (same as the non- > nested > > case), while the first stage page faults need to be further delivered to > > the guest, which is being implemented in [4] on ARM. My thought on this > > is to report the page faults to VFIO regardless of the occured stage (try > > to carry the stage information), and handle respectively according to the > > configured mode in VFIO. Or the IOMMU driver might evolve to support > more... > > > > Might TODO: > > - Optimize the faulting path, and measure the performance (it might still > > be a big issue). > > - Add support for PRI. > > - Add a MMU notifier to avoid pinning. > > - Add support for the nested mode. > > ... > > > > Any comments and suggestions are very welcome. :-) > > I expect performance to be pretty bad here, the lookup involved per > fault is excessive. There are cases where a user is not going to be > willing to have a slow ramp up of performance for their devices as they > fault in pages, so we might need to considering making this > configurable through the vfio interface. Our page mapping also only There is another factor to be considered. The presence of IOMMU_ DEV_FEAT_IOPF just indicates the device capability of triggering I/O page fault through the IOMMU, but not exactly means that the device can tolerate I/O page fault for arbitrary DMA requests. In reality, many devices allow I/O faulting only in selective contexts. However, there is no standard way (e.g. PCISIG) for the device to report whether arbitrary I/O fault is allowed. Then we may have to maintain device specific knowledge in software, e.g. in an opt-in table to list devices which allows arbitrary faults. For devices which only support selective faulting, a mediator (either through vendor extensions on vfio-pci-core or a mdev wrapper) might be necessary to help lock down non-faultable mappings and then enable faulting on the rest mappings. > grows here, should mappings expire or do we need a least recently > mapped tracker to avoid exceeding the user's locked memory limit? How > does a user know what to set for a locked memory limit? The behavior > here would lead to cases where an idle system might be ok, but as soon > as load increases with more inflight DMA, we start seeing > "unpredictable" I/O faults from the user perspective. Seems like there > are lots of outstanding considerations and I'd also like to hear from > the SVA folks about how this meshes with their work. Thanks, > The main overlap between this feature and SVA is the IOPF reporting framework, which currently still has gap to support both in nested mode, as discussed here: https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/ Once that gap is resolved in the future, the VFIO fault handler just adopts different actions according to the fault-level: 1st level faults are forwarded to userspace thru the vSVA path while 2nd-level faults are fixed (or warned if not intended) by VFIO itself thru the IOMMU mapping interface. Thanks Kevin