Re: VFIO_IOMMU_TYPE1 inefficiencies for certain use cases

Alex Williamson <alex.williamson@xxxxxxxxxx> · Tue, 22 Feb 2022 12:05:00 -0700

Hi David,

On Fri, 18 Feb 2022 19:17:12 +0900
David Stevens <stevensd@xxxxxxxxxxxx> wrote:

> Hi all,
> 
> I'm working on a consumer virtualization project that uses VFIO for
> passthrough devices. However, the way it's using them is a little
> unusual, and that results in some pretty significant inefficiencies in
> the vfio_iommu_type1 implementation. Before going ahead and trying to
> address the problems myself, I'm hoping to get some guidance about
> what sort of changes might be able to be merged upstream.
> 
> The usage pattern that is not well supported by VFIO is many small,
> dynamic mappings. We have this pattern because we are using
> virtio-iommu to isolate some untrusted passthrough devices within the
> guest, and also because for the rest of the passthrough devices we're
> using coIOMMU [1] to support overcommit of memory in the host by not
> pinning all of the guest's memory. Both of these rely on using
> VFIO_IOMMU_MAP_DMA at the page granularity. This results in a lot of
> metadata overhead from the struct vfio_dma. At 80 bytes of metadata
> per page (actually 128 due to rounding in kmalloc), 1-2% of total
> system memory can end up being used for VFIO metadata.
> 
> First, is this sort of use case something that upstream wants to address?
> 
> If it's something that is worth addressing, here are two possible
> approaches I've thought of. I haven't implemented either yet, so there
> might be details I'm missing, or the API changes or maintenance costs
> might not be acceptable. Both are a little bit different from
> VFIO_TYPE1_IOMMU, so they would probably require at least a
> VFIO_TYPE1v3_IOMMU type.
> 
> 1) Add an alternative xarray implementation for vfio_iommu.dma_list
> that maintains the iova -> vaddr mapping at the page granularity. Most
> of the metadata in struct vfio_dma can be packed into the extra bits
> in the vaddr. The two exceptions are vfio_dma.task and
> vfio_dma.pfn_list. The lack of space for vfio_dma.task could be
> addressed by requiring that all mappings have the same task. Without
> vfio_dma.pfn_list, we would lose the refcount maintained by struct
> vfio_pfn, which means every call to vfio_iommu_type1_pin_pages would
> require re-pinning the page. This might be a bit more inefficient,
> although it seems like it should be okay from a correctness
> standpoint.
> 
> One downside of this approach is that it is only more memory efficient
> than the rbtree if the mapping is quite dense, since a struct xa_node
> is quite a bit larger than a struct vfio_dma. This would help the most
> problematic coIOMMU cases, but it would still leave certain
> virtio-iommu cases unaddressed. Also, although most of the struct
> vfio_dma metadata could be packed into the xarray today, that might no
> longer be the case if more metadata was added in the future.
> 
> 2) A second alternative would be to drop the VFIO metadata altogether
> and basically directly expose the iommu APIs (with the required
> locking/validation). This would be incompatible with mediated devices,
> and it wouldn't be able to support the various bells and whistles of
> the VFIO api. However, I think the core mapping/unmapping logic could
> still be shared between the normal struct vfio_dma tree and this
> approach. Personally, I'm a little more in favor of this one, since it
> completely avoids VFIO memory overhead in both of my use cases.
> 
> Do either of those approaches sound like something that might work? If
> neither is okay, are there any suggestions for approaches to take?

I'd advise looking at discussions[1][2] (and hopefully patch series
before too long) around development of the iommufd.  This driver is
meant to provide a common iommu mapping interface for various userspace
use cases, including both vfio and vdpa.  I think the best approach at
this point would be to work with that effort to champion a low-latency,
space efficient, page mapping interface that could be used with your
design.  Perhaps a nested paging solution, which is clearly something
the iommufd work intends to enable, would fit your needs.

In the interim, I don't think we want to promote further extensions to
the existing vfio type1 backend.  Thanks,

Alex

[1]https://lore.kernel.org/all/?q=iommufd
[2]https://lore.kernel.org/all/20210919063848.1476776-1-yi.l.liu@xxxxxxxxx/