Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 23 Mar 2022 21:33:42 -0300

On Wed, Mar 23, 2022 at 04:51:25PM -0600, Alex Williamson wrote:

> My overall question here would be whether we can actually achieve a
> compatibility interface that has sufficient feature transparency that we
> can dump vfio code in favor of this interface, or will there be enough
> niche use cases that we need to keep type1 and vfio containers around
> through a deprecation process?

Other than SPAPR, I think we can.

> The locked memory differences for one seem like something that
> libvirt wouldn't want hidden

I'm first interested to have an understanding how this change becomes
a real problem in practice that requires libvirt to do something
different for vfio or iommufd. We can discuss in the other thread

If this is the make or break point then I think we can deal with it
either by going back to what vfio does now or perhaps some other
friendly compat approach..

> and we have questions regarding support for vaddr hijacking

I'm not sure what vaddr hijacking is? Do you mean
VFIO_DMA_MAP_FLAG_VADDR ? There is a comment that outlines my plan to
implement it in a functionally compatible way without the deadlock
problem. I estimate this as a small project.

> and different ideas how to implement dirty page tracking, 

I don't think this is compatibility. No kernel today triggers qemu to
use this feature as no kernel supports live migration. No existing
qemu will trigger this feature with new kernels that support live
migration v2. Therefore we can adjust qemu's dirty tracking at the
same time we enable migration v2 in qemu.

With Joao's work we are close to having a solid RFC to come with
something that can be fully implemented.

Hopefully we can agree to this soon enough that qemu can come with a
full package of migration v2 support including the dirty tracking
solution.

> not to mention the missing features that are currently well used,
> like p2p mappings, coherency tracking, mdev, etc.

I consider these all mandatory things, they won't be left out.

The reason they are not in the RFC is mostly because supporting them
requires work outside just this iommufd area, and I'd like this series
to remain self-contained.

I've already got a draft to add DMABUF support to VFIO PCI which
nicely solves the follow_pfn security problem, we want to do this for
another reason already. I'm waiting for some testing feedback before
posting it. Need some help from Daniel make the DMABUF revoke semantic
him and I have been talking about. In the worst case can copy the
follow_pfn approach.

Intel no-snoop is simple enough, just needs some Intel cleanup parts.

mdev will come along with the final VFIO integration, all the really
hard parts are done already. The VFIO integration is a medium sized
task overall.

So, I'm not ready to give up yet :)

> Where do we focus attention?  Is symlinking device files our proposal
> to userspace and is that something achievable, or do we want to use
> this compatibility interface as a means to test the interface and
> allow userspace to make use of it for transition, if their use cases
> allow it, perhaps eventually performing the symlink after deprecation
> and eventual removal of the vfio container and type1 code?  Thanks,

symlinking device files is definitely just a suggested way to expedite
testing.

Things like qemu that are learning to use iommufd-only features should
learn to directly open iommufd instead of vfio container to activate
those features.

Looking long down the road I don't think we want to have type 1 and
iommufd code forever. So, I would like to make an option to compile
out vfio container support entirely and have that option arrange for
iommufd to provide the container device node itself.

I think we can get there pretty quickly, or at least I haven't got
anything that is scaring me alot (beyond SPAPR of course)

For the dpdk/etcs of the world I think we are already there.

Jason