On Fri, Jan 10, 2025 at 09:26:02AM +0100, David Hildenbrand wrote: > > > > > > > > > > One limitation (also discussed in the guest_memfd > > > > > > > > > > meeting) is that VFIO expects the DMA mapping for > > > > > > > > > > a specific IOVA to be mapped and unmapped with the > > > > > > > > > > same granularity. Not just same granularity, whatever you map you have to unmap in whole. map/unmap must be perfectly paired by userspace. > > > > > > > > > > such as converting a small region within a larger > > > > > > > > > > region. To prevent such invalid cases, all > > > > > > > > > > operations are performed with 4K granularity. The > > > > > > > > > > possible solutions we can think of are either to > > > > > > > > > > enable VFIO to support partial unmap Yes, you can do that, but it is aweful for performance everywhere > > > > > > iopt_cut_iova() happens in iommufd vfio_compat.c, which is to make > > > > > > iommufd be compatible with old VFIO_TYPE1. IIUC, it happens with > > > > > > disable_large_page=true. That means the large IOPTE is also disabled in > > > > > > IOMMU. So it can do the split easily. See the comment in > > > > > > iommufd_vfio_set_iommu(). Yes. But I am working on a project to make this more general purpose and not have the 4k limitation. There are now several use cases for this kind of cut feature. https://lore.kernel.org/linux-iommu/7-v1-01fa10580981+1d-iommu_pt_jgg@xxxxxxxxxx/ > > > > > This is all true but this also means that "The former requires complex > > > > > changes in VFIO" is not entirely true - some code is already there. Well, to do it without forcing 4k requires complex changes. > > > > Hmm, my statement is a little confusing. The bottleneck is that the > > > > IOMMU driver doesn't support the large page split. So if we want to > > > > enable large page and want to do partial unmap, it requires complex > > > > change. Yes, this is what I'm working on. > > > We won't need to split large pages (if we stick to 4K for now), we need > > > to split large mappings (not large pages) to allow partial unmapping and > > > iopt_area_split() seems to be doing this. Thanks, Correct > > You mean we can disable large page in iommufd and then VFIO will be able > > to do partial unmap. Yes, I think it is doable and we can avoid many > > ioctl context switches overhead. Right > So I understand this correctly: the disable_large_pages=true will imply that > we never have PMD mappings such that we can atomically poke a hole in a > mapping, without temporarily having to remove a PMD mapping in the iommu > table to insert a PTE table? Yes > batch_iommu_map_small() seems to document that behavior. Yes > It's interesting that that comment points out that this is purely "VFIO > compatibility", and that it otherwise violates the iommufd invariant: > "pairing map/unmap". So, it is against the real iommufd design ... IIRC you can only trigger split using the VFIO type 1 legacy API. We would need to formalize split as an IOMMUFD native ioctl. Nobody should use this stuf through the legacy type 1 API!!!! > Back when working on virtio-mem support (RAMDiscardManager), thought there > was not way to reliably do atomic partial unmappings. Correct Jason