Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

Eric Auger <eric.auger@xxxxxxxxxx> · Thu, 24 Mar 2022 09:13:17 +0100



Hi,

On 3/24/22 1:33 AM, Jason Gunthorpe wrote:
> On Wed, Mar 23, 2022 at 04:51:25PM -0600, Alex Williamson wrote:
>
>> My overall question here would be whether we can actually achieve a
>> compatibility interface that has sufficient feature transparency that we
>> can dump vfio code in favor of this interface, or will there be enough
>> niche use cases that we need to keep type1 and vfio containers around
>> through a deprecation process?
> Other than SPAPR, I think we can.
>
>> The locked memory differences for one seem like something that
>> libvirt wouldn't want hidden
> I'm first interested to have an understanding how this change becomes
> a real problem in practice that requires libvirt to do something
> different for vfio or iommufd. We can discuss in the other thread
>
> If this is the make or break point then I think we can deal with it
> either by going back to what vfio does now or perhaps some other
> friendly compat approach..
>
>> and we have questions regarding support for vaddr hijacking
> I'm not sure what vaddr hijacking is? Do you mean
> VFIO_DMA_MAP_FLAG_VADDR ? There is a comment that outlines my plan to
> implement it in a functionally compatible way without the deadlock
> problem. I estimate this as a small project.
>
>> and different ideas how to implement dirty page tracking, 
> I don't think this is compatibility. No kernel today triggers qemu to
> use this feature as no kernel supports live migration. No existing
> qemu will trigger this feature with new kernels that support live
> migration v2. Therefore we can adjust qemu's dirty tracking at the
> same time we enable migration v2 in qemu.
>
> With Joao's work we are close to having a solid RFC to come with
> something that can be fully implemented.
>
> Hopefully we can agree to this soon enough that qemu can come with a
> full package of migration v2 support including the dirty tracking
> solution.
>
>> not to mention the missing features that are currently well used,
>> like p2p mappings, coherency tracking, mdev, etc.
> I consider these all mandatory things, they won't be left out.
>
> The reason they are not in the RFC is mostly because supporting them
> requires work outside just this iommufd area, and I'd like this series
> to remain self-contained.
>
> I've already got a draft to add DMABUF support to VFIO PCI which
> nicely solves the follow_pfn security problem, we want to do this for
> another reason already. I'm waiting for some testing feedback before
> posting it. Need some help from Daniel make the DMABUF revoke semantic
> him and I have been talking about. In the worst case can copy the
> follow_pfn approach.
>
> Intel no-snoop is simple enough, just needs some Intel cleanup parts.
>
> mdev will come along with the final VFIO integration, all the really
> hard parts are done already. The VFIO integration is a medium sized
> task overall.
>
> So, I'm not ready to give up yet :)
>
>> Where do we focus attention?  Is symlinking device files our proposal
>> to userspace and is that something achievable, or do we want to use
>> this compatibility interface as a means to test the interface and
>> allow userspace to make use of it for transition, if their use cases
>> allow it, perhaps eventually performing the symlink after deprecation
>> and eventual removal of the vfio container and type1 code?  Thanks,
> symlinking device files is definitely just a suggested way to expedite
> testing.
>
> Things like qemu that are learning to use iommufd-only features should
> learn to directly open iommufd instead of vfio container to activate
> those features.
>
> Looking long down the road I don't think we want to have type 1 and
> iommufd code forever. So, I would like to make an option to compile
> out vfio container support entirely and have that option arrange for
> iommufd to provide the container device node itself.
I am currently working on migrating the QEMU VFIO device onto the new
API because since after our discussions the compat mode cannot be used
anyway to implemented nesting. I hope I will be able to present
something next week.

Thanks

Eric
>
> I think we can get there pretty quickly, or at least I haven't got
> anything that is scaring me alot (beyond SPAPR of course)
>
> For the dpdk/etcs of the world I think we are already there.
>
> Jason
>