Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

Yi Liu <yi.l.liu@xxxxxxxxx> · Tue, 29 Mar 2022 17:17:50 +0800

On 2022/3/24 06:51, Alex Williamson wrote:
On Fri, 18 Mar 2022 14:27:36 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into io_pagetable operations. Doing so allows the use of
iommufd by symliking /dev/vfio/vfio to /dev/iommufd. Allowing VFIO to
SET_CONTAINER using a iommufd instead of a container fd is a followup
series.

Internally the compatibility API uses a normal IOAS object that, like
vfio, is automatically allocated when the first device is
attached.

Userspace can also query or set this IOAS object directly using the
IOMMU_VFIO_IOAS ioctl. This allows mixing and matching new iommufd only
features while still using the VFIO style map/unmap ioctls.

While this is enough to operate qemu, it is still a bit of a WIP with a
few gaps to be resolved:

  - Only the TYPE1v2 mode is supported where unmap cannot punch holes or
    split areas. The old mode can be implemented with a new operation to
    split an iopt_area into two without disturbing the iopt_pages or the
    domains, then unmapping a whole area as normal.

  - Resource limits rely on memory cgroups to bound what userspace can do
    instead of the module parameter dma_entry_limit.

  - VFIO P2P is not implemented. Avoiding the follow_pfn() mis-design will
    require some additional work to properly expose PFN lifecycle between
    VFIO and iommfd

  - Various components of the mdev API are not completed yet

  - Indefinite suspend of SW access (VFIO_DMA_MAP_FLAG_VADDR) is not
    implemented.

  - The 'dirty tracking' is not implemented

  - A full audit for pedantic compatibility details (eg errnos, etc) has
    not yet been done

  - powerpc SPAPR is left out, as it is not connected to the iommu_domain
    framework. My hope is that SPAPR will be moved into the iommu_domain
    framework as a special HW specific type and would expect power to
    support the generic interface through a normal iommu_domain.

My overall question here would be whether we can actually achieve a
compatibility interface that has sufficient feature transparency that we
can dump vfio code in favor of this interface, or will there be enough
niche use cases that we need to keep type1 and vfio containers around
through a deprecation process?

The locked memory differences for one seem like something that libvirt
wouldn't want hidden and we have questions regarding support for vaddr
hijacking and different ideas how to implement dirty page tracking, not
to mention the missing features that are currently well used, like p2p
mappings, coherency tracking, mdev, etc.

It seems like quite an endeavor to fill all these gaps, while at the
same time QEMU will be working to move to use iommufd directly in order
to gain all the new features.

Hi Alex,

Jason hasn't included the vfio changes for adapting to iommufd. But it's
in this branch 
(https://github.com/luxis1999/iommufd/commits/iommufd-v5.17-rc6). Eric and 
me are working on adding the iommufd support in QEMU as well. If wanting to 
run the new QEMU on old kernel, QEMU is supposed to support both the legacy 
group/container interface and the latest device/iommufd interface. We've 
got some draft code toward this direction 
(https://github.com/luxis1999/qemu/commits/qemu-for-5.17-rc4-vm). It works 
for both legacy group/container and device/iommufd path. It's just for 
reference so far, Eric and me will have a further sync on it.

Where do we focus attention?  Is symlinking device files our proposal
to userspace and is that something achievable, or do we want to use
this compatibility interface as a means to test the interface and
allow userspace to make use of it for transition, if their use cases
allow it, perhaps eventually performing the symlink after deprecation
and eventual removal of the vfio container and type1 code?  Thanks,

I'm sure it is possible that one day the group/container interface will be
removed in kernel. Perhaps this will happen when SPAPR is supported by 
iommufd. But how about QEMU, should QEMU keep backward compatibility 
forever? or one day QEMU may also remove the group/container path and hence
unable to work on the old kernels?

--
Regards,
Yi Liu