Hi all, I'm working on a consumer virtualization project that uses VFIO for passthrough devices. However, the way it's using them is a little unusual, and that results in some pretty significant inefficiencies in the vfio_iommu_type1 implementation. Before going ahead and trying to address the problems myself, I'm hoping to get some guidance about what sort of changes might be able to be merged upstream. The usage pattern that is not well supported by VFIO is many small, dynamic mappings. We have this pattern because we are using virtio-iommu to isolate some untrusted passthrough devices within the guest, and also because for the rest of the passthrough devices we're using coIOMMU [1] to support overcommit of memory in the host by not pinning all of the guest's memory. Both of these rely on using VFIO_IOMMU_MAP_DMA at the page granularity. This results in a lot of metadata overhead from the struct vfio_dma. At 80 bytes of metadata per page (actually 128 due to rounding in kmalloc), 1-2% of total system memory can end up being used for VFIO metadata. First, is this sort of use case something that upstream wants to address? If it's something that is worth addressing, here are two possible approaches I've thought of. I haven't implemented either yet, so there might be details I'm missing, or the API changes or maintenance costs might not be acceptable. Both are a little bit different from VFIO_TYPE1_IOMMU, so they would probably require at least a VFIO_TYPE1v3_IOMMU type. 1) Add an alternative xarray implementation for vfio_iommu.dma_list that maintains the iova -> vaddr mapping at the page granularity. Most of the metadata in struct vfio_dma can be packed into the extra bits in the vaddr. The two exceptions are vfio_dma.task and vfio_dma.pfn_list. The lack of space for vfio_dma.task could be addressed by requiring that all mappings have the same task. Without vfio_dma.pfn_list, we would lose the refcount maintained by struct vfio_pfn, which means every call to vfio_iommu_type1_pin_pages would require re-pinning the page. This might be a bit more inefficient, although it seems like it should be okay from a correctness standpoint. One downside of this approach is that it is only more memory efficient than the rbtree if the mapping is quite dense, since a struct xa_node is quite a bit larger than a struct vfio_dma. This would help the most problematic coIOMMU cases, but it would still leave certain virtio-iommu cases unaddressed. Also, although most of the struct vfio_dma metadata could be packed into the xarray today, that might no longer be the case if more metadata was added in the future. 2) A second alternative would be to drop the VFIO metadata altogether and basically directly expose the iommu APIs (with the required locking/validation). This would be incompatible with mediated devices, and it wouldn't be able to support the various bells and whistles of the VFIO api. However, I think the core mapping/unmapping logic could still be shared between the normal struct vfio_dma tree and this approach. Personally, I'm a little more in favor of this one, since it completely avoids VFIO memory overhead in both of my use cases. Do either of those approaches sound like something that might work? If neither is okay, are there any suggestions for approaches to take? Thanks, David [1] https://www.usenix.org/conference/atc20/presentation/tian