> From: Tian, Kevin <kevin.tian@xxxxxxxxx> > Sent: Thursday, February 4, 2021 2:52 PM > > > From: Shenming Lu <lushenming@xxxxxxxxxx> > > Sent: Tuesday, February 2, 2021 2:42 PM > > > > On 2021/2/1 15:56, Tian, Kevin wrote: > > >> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > >> Sent: Saturday, January 30, 2021 6:58 AM > > >> > > >> On Mon, 25 Jan 2021 17:03:58 +0800 > > >> Shenming Lu <lushenming@xxxxxxxxxx> wrote: > > >> > > >>> Hi, > > >>> > > >>> The static pinning and mapping problem in VFIO and possible > solutions > > >>> have been discussed a lot [1, 2]. One of the solutions is to add I/O > > >>> page fault support for VFIO devices. Different from those relatively > > >>> complicated software approaches such as presenting a vIOMMU that > > >> provides > > >>> the DMA buffer information (might include para-virtualized > > optimizations), > > >>> IOPF mainly depends on the hardware faulting capability, such as the > > PCIe > > >>> PRI extension or Arm SMMU stall model. What's more, the IOPF > support > > in > > >>> the IOMMU driver is being implemented in SVA [3]. So do we > consider to > > >>> add IOPF support for VFIO passthrough based on the IOPF part of SVA > at > > >>> present? > > >>> > > >>> We have implemented a basic demo only for one stage of translation > > (GPA > > >>> -> HPA in virtualization, note that it can be configured at either stage), > > >>> and tested on Hisilicon Kunpeng920 board. The nested mode is more > > >> complicated > > >>> since VFIO only handles the second stage page faults (same as the > non- > > >> nested > > >>> case), while the first stage page faults need to be further delivered to > > >>> the guest, which is being implemented in [4] on ARM. My thought on > this > > >>> is to report the page faults to VFIO regardless of the occured stage > (try > > >>> to carry the stage information), and handle respectively according to > the > > >>> configured mode in VFIO. Or the IOMMU driver might evolve to > support > > >> more... > > >>> > > >>> Might TODO: > > >>> - Optimize the faulting path, and measure the performance (it might > still > > >>> be a big issue). > > >>> - Add support for PRI. > > >>> - Add a MMU notifier to avoid pinning. > > >>> - Add support for the nested mode. > > >>> ... > > >>> > > >>> Any comments and suggestions are very welcome. :-) > > >> > > >> I expect performance to be pretty bad here, the lookup involved per > > >> fault is excessive. There are cases where a user is not going to be > > >> willing to have a slow ramp up of performance for their devices as they > > >> fault in pages, so we might need to considering making this > > >> configurable through the vfio interface. Our page mapping also only > > > > > > There is another factor to be considered. The presence of IOMMU_ > > > DEV_FEAT_IOPF just indicates the device capability of triggering I/O > > > page fault through the IOMMU, but not exactly means that the device > > > can tolerate I/O page fault for arbitrary DMA requests. > > > > Yes, so I add a iopf_enabled field in VFIO to indicate the whole path > faulting > > capability and set it to true after registering a VFIO page fault handler. > > > > > In reality, many > > > devices allow I/O faulting only in selective contexts. However, there > > > is no standard way (e.g. PCISIG) for the device to report whether > > > arbitrary I/O fault is allowed. Then we may have to maintain device > > > specific knowledge in software, e.g. in an opt-in table to list devices > > > which allows arbitrary faults. For devices which only support selective > > > faulting, a mediator (either through vendor extensions on vfio-pci-core > > > or a mdev wrapper) might be necessary to help lock down non-faultable > > > mappings and then enable faulting on the rest mappings. > > > > For devices which only support selective faulting, they could tell it to the > > IOMMU driver and let it filter out non-faultable faults? Do I get it wrong? > > Not exactly to IOMMU driver. There is already a vfio_pin_pages() for > selectively page-pinning. The matter is that 'they' imply some device > specific logic to decide which pages must be pinned and such knowledge > is outside of VFIO. > > From enabling p.o.v we could possibly do it in phased approach. First > handles devices which tolerate arbitrary DMA faults, and then extends > to devices with selective-faulting. The former is simpler, but with one > main open whether we want to maintain such device IDs in a static > table in VFIO or rely on some hints from other components (e.g. PF > driver in VF assignment case). Let's see how Alex thinks about it. > > > > > > > > >> grows here, should mappings expire or do we need a least recently > > >> mapped tracker to avoid exceeding the user's locked memory limit? > How > > >> does a user know what to set for a locked memory limit? The behavior > > >> here would lead to cases where an idle system might be ok, but as > soon > > >> as load increases with more inflight DMA, we start seeing > > >> "unpredictable" I/O faults from the user perspective. Seems like there > > >> are lots of outstanding considerations and I'd also like to hear from > > >> the SVA folks about how this meshes with their work. Thanks, > > >> > > > > > > The main overlap between this feature and SVA is the IOPF reporting > > > framework, which currently still has gap to support both in nested > > > mode, as discussed here: > > > > > > https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/ > > > > > > Once that gap is resolved in the future, the VFIO fault handler just > > > adopts different actions according to the fault-level: 1st level faults > > > are forwarded to userspace thru the vSVA path while 2nd-level faults > > > are fixed (or warned if not intended) by VFIO itself thru the IOMMU > > > mapping interface. > > > > I understand what you mean is: > > From the perspective of VFIO, first, we need to set FEAT_IOPF, and then > > regster its > > own handler with a flag to indicate FLAT or NESTED and which level is > > concerned, > > thus the VFIO handler can handle the page faults directly according to the > > carried > > level information. > > > > Is there any plan for evolving(implementing) the IOMMU driver to > support > > this? Or > > could we help this? :-) > > > > Yes, it's in plan but just not happened yet. We are still focusing on guest > SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always welcomed > to collaborate/help if you have time. ?? yeah, I saw Eric's page fault support patch is listed as reference. BTW. one thing needs to clarify, currently only one iommu fault handler supported for a single device. So for the fault handler added in this series, it should be consolidated with the one added in Eric's series. Regards, Yi Liu > Thanks > Kevin