Hi David, On Fri, 18 Feb 2022 19:17:12 +0900 David Stevens <stevensd@xxxxxxxxxxxx> wrote: > Hi all, > > I'm working on a consumer virtualization project that uses VFIO for > passthrough devices. However, the way it's using them is a little > unusual, and that results in some pretty significant inefficiencies in > the vfio_iommu_type1 implementation. Before going ahead and trying to > address the problems myself, I'm hoping to get some guidance about > what sort of changes might be able to be merged upstream. > > The usage pattern that is not well supported by VFIO is many small, > dynamic mappings. We have this pattern because we are using > virtio-iommu to isolate some untrusted passthrough devices within the > guest, and also because for the rest of the passthrough devices we're > using coIOMMU [1] to support overcommit of memory in the host by not > pinning all of the guest's memory. Both of these rely on using > VFIO_IOMMU_MAP_DMA at the page granularity. This results in a lot of > metadata overhead from the struct vfio_dma. At 80 bytes of metadata > per page (actually 128 due to rounding in kmalloc), 1-2% of total > system memory can end up being used for VFIO metadata. > > First, is this sort of use case something that upstream wants to address? > > If it's something that is worth addressing, here are two possible > approaches I've thought of. I haven't implemented either yet, so there > might be details I'm missing, or the API changes or maintenance costs > might not be acceptable. Both are a little bit different from > VFIO_TYPE1_IOMMU, so they would probably require at least a > VFIO_TYPE1v3_IOMMU type. > > 1) Add an alternative xarray implementation for vfio_iommu.dma_list > that maintains the iova -> vaddr mapping at the page granularity. Most > of the metadata in struct vfio_dma can be packed into the extra bits > in the vaddr. The two exceptions are vfio_dma.task and > vfio_dma.pfn_list. The lack of space for vfio_dma.task could be > addressed by requiring that all mappings have the same task. Without > vfio_dma.pfn_list, we would lose the refcount maintained by struct > vfio_pfn, which means every call to vfio_iommu_type1_pin_pages would > require re-pinning the page. This might be a bit more inefficient, > although it seems like it should be okay from a correctness > standpoint. > > One downside of this approach is that it is only more memory efficient > than the rbtree if the mapping is quite dense, since a struct xa_node > is quite a bit larger than a struct vfio_dma. This would help the most > problematic coIOMMU cases, but it would still leave certain > virtio-iommu cases unaddressed. Also, although most of the struct > vfio_dma metadata could be packed into the xarray today, that might no > longer be the case if more metadata was added in the future. > > 2) A second alternative would be to drop the VFIO metadata altogether > and basically directly expose the iommu APIs (with the required > locking/validation). This would be incompatible with mediated devices, > and it wouldn't be able to support the various bells and whistles of > the VFIO api. However, I think the core mapping/unmapping logic could > still be shared between the normal struct vfio_dma tree and this > approach. Personally, I'm a little more in favor of this one, since it > completely avoids VFIO memory overhead in both of my use cases. > > Do either of those approaches sound like something that might work? If > neither is okay, are there any suggestions for approaches to take? I'd advise looking at discussions[1][2] (and hopefully patch series before too long) around development of the iommufd. This driver is meant to provide a common iommu mapping interface for various userspace use cases, including both vfio and vdpa. I think the best approach at this point would be to work with that effort to champion a low-latency, space efficient, page mapping interface that could be used with your design. Perhaps a nested paging solution, which is clearly something the iommufd work intends to enable, would fit your needs. In the interim, I don't think we want to promote further extensions to the existing vfio type1 backend. Thanks, Alex [1]https://lore.kernel.org/all/?q=iommufd [2]https://lore.kernel.org/all/20210919063848.1476776-1-yi.l.liu@xxxxxxxxx/