On 2021/1/30 6:57, Alex Williamson wrote: > On Mon, 25 Jan 2021 17:03:58 +0800 > Shenming Lu <lushenming@xxxxxxxxxx> wrote: > >> Hi, >> >> The static pinning and mapping problem in VFIO and possible solutions >> have been discussed a lot [1, 2]. One of the solutions is to add I/O >> page fault support for VFIO devices. Different from those relatively >> complicated software approaches such as presenting a vIOMMU that provides >> the DMA buffer information (might include para-virtualized optimizations), >> IOPF mainly depends on the hardware faulting capability, such as the PCIe >> PRI extension or Arm SMMU stall model. What's more, the IOPF support in >> the IOMMU driver is being implemented in SVA [3]. So do we consider to >> add IOPF support for VFIO passthrough based on the IOPF part of SVA at >> present? >> >> We have implemented a basic demo only for one stage of translation (GPA >> -> HPA in virtualization, note that it can be configured at either stage), >> and tested on Hisilicon Kunpeng920 board. The nested mode is more complicated >> since VFIO only handles the second stage page faults (same as the non-nested >> case), while the first stage page faults need to be further delivered to >> the guest, which is being implemented in [4] on ARM. My thought on this >> is to report the page faults to VFIO regardless of the occured stage (try >> to carry the stage information), and handle respectively according to the >> configured mode in VFIO. Or the IOMMU driver might evolve to support more... >> >> Might TODO: >> - Optimize the faulting path, and measure the performance (it might still >> be a big issue). >> - Add support for PRI. >> - Add a MMU notifier to avoid pinning. >> - Add support for the nested mode. >> ... >> >> Any comments and suggestions are very welcome. :-) > > I expect performance to be pretty bad here, the lookup involved per > fault is excessive. We might consider to prepin more pages as a further optimization. > There are cases where a user is not going to be > willing to have a slow ramp up of performance for their devices as they > fault in pages, so we might need to considering making this > configurable through the vfio interface. Yeah, makes sense, I will try to implement this: maybe add a ioctl called VFIO_IOMMU_ENABLE_IOPF for Type1 VFIO IOMMU... > Our page mapping also only > grows here, should mappings expire or do we need a least recently > mapped tracker to avoid exceeding the user's locked memory limit? How > does a user know what to set for a locked memory limit? Yeah, we can add a LRU(mapped) tracker to release the pages when exceeding a memory limit, maybe have a thread to periodically check this. And as for the memory limit, maybe we could give the user some levels (10%(default)/30%/50%/70%/unlimited of the total user memory (mapping size)) to choose from via the VFIO_IOMMU_ENABLE_IOPF ioctl... > The behavior > here would lead to cases where an idle system might be ok, but as soon > as load increases with more inflight DMA, we start seeing > "unpredictable" I/O faults from the user perspective. "unpredictable" I/O faults? We might see more problems after more testing... Thanks, Shenming > Seems like there > are lots of outstanding considerations and I'd also like to hear from > the SVA folks about how this meshes with their work. Thanks, > > Alex > > . >