Re: [RFC PATCH 05/13] iommufd: Serialise persisted iommufds and ioas

Jacob Pan <jacob.pan@xxxxxxxxxxxxxxxxxxx> · Wed, 6 Nov 2024 11:18:50 -0800

Hi Jason,

On Mon, 4 Nov 2024 09:00:11 -0400
Jason Gunthorpe <jgg@xxxxxxxx> wrote:

> On Sat, Nov 02, 2024 at 10:22:54AM +0000, Gowans, James wrote:
> 
> > Yes, I think the guidance was to bind a device to iommufd in noiommu
> > mode. It does seem a bit weird to use iommufd with noiommu, but we
> > agreed it's the best/simplest way to get the functionality.   
> 
> noiommu should still have an ioas and still have kernel managed page
> pinning.
> 
> My remark to bring it to iommufd was to also make it a fully
> architected feature and stop relying on mprotect and /proc/ tricks.
> 
Just to clarify my tentative understanding with more details(please
correct):

1. create an iommufd access object for noiommu device when
binding to an iommufd ctx.

2. all user memory used by the device under noiommu mode should be
pinned by iommufd, i.e. iommufd_access_pin_pages().
I guess you meant stop doing mlock instead of mprotect trick? I think
openHCL is using /dev/mem trick.

3. ioas can be attched to the noiommu iommufd_access object, similar to
emulated device, mdev.

What kind/source of memory should be supported here?
e.g. device meory regions exposed by PCI BARs.

> > Then as you suggest below the IOMMUFD_OBJ_DEVICE would be serialised
> > too in some way, probably by iommufd telling the PCI layer that this
> > device must be persistent and hence not to re-probe it on kexec.  
> 
> Presumably VFIO would be doing some/most of this part since it is the
> driver that will be binding?
> 
Yes, it is the user mode driver that initiates the binding. I was
thinking since the granularity for persistency is per iommufd ctx, the
VFIO device flag to mark keep_alive can come from iommufd ctx.

> > It's all a bit hand wavy at the moment, but something along those
> > lines probably makes sense. I need to work on rev2 of this RFC as
> > per Jason's feedback in the other thread. Rev2 will make the
> > restore path more userspace driven, with fresh iommufd and pgtables
> > objects being created and then atomically swapped over too. I'll
> > also get the PCI layer involved with rev2. Once that's out (it'll
> > be a few weeks as I'm on leave) then let's take a look at how the
> > noiommu device persistence case would fit in.  
> 
> In a certain sense it would be nice to see the noiommu flow as it
> breaks apart the problem into the first dependency:
> 
>  How to get the device handed across the kexec and safely land back in
>  VFIO, and only VFIO's hands.
> 
> Preserving the iommu HW configuration is an incremental step built on
> that base line.
Makes sense, I need to catch up on the KHO series and hook up noiommu
at the first step.

> Also, FWIW, this needs to follow good open source practices - we need
> an open userspace for the feature and the kernel stuff should be
> merged in a logical order.
> 
Yes, we will have matching userspace in openHCL
https://github.com/microsoft/openvmm

Thanks,

Jacob