Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 21 Apr 2021 14:52:03 -0300

On Wed, Apr 21, 2021 at 10:54:51AM -0600, Alex Williamson wrote:

> That's essentially replacing vfio-core, where I think we're more

I am only talking about /dev/vfio here which is basically the IOMMU
interface part.

I still expect that VFIO_GROUP_SET_CONTAINER will be used to connect
/dev/{ioasid,vfio} to the VFIO group and all the group and device
logic stays inside VFIO.

The appeal of unifying /dev/{ioasid,vfio} to a single fops is that it
cleans up vfio a lot - we don't have to have two different code paths
where one handles a vfio_container and the other a ioasid_container
and the all the related different iommu ops and so on.

Everything can be switched to ioasid_container all down the line. If
it wasn't for PPC this looks fairly simple.

Since getting rid of PPC looks a bit hard, we'd be stuck with
accepting a /dev/ioasid and then immediately wrappering it in a
vfio_container an shimming it through a vfio_iommu_ops. It is not
ideal at all, but in my look around I don't see a major problem if
type1 implementation is moved to live under /dev/ioasid.

For concreteness if we look at the set container flow with ioasid I'd
say something like:

vfio_group_fops_unl_ioctl()
 VFIO_GROUP_SET_CONTAINER
  vfio_group_set_container()
     if (f.file->f_op == &vfio_fops) {
          // Use a real vfio_container and vfio_iommu_driver
          driver->ops->attach_group()
             tce_iommu_attach_group()
     }

     if (ioasid_container = ioasid_get_from_fd(container_fd)) {
         // create a dummy vfio_container and use the ioasid driver
	 container = kzalloc()
         container->iommu_driver = ioasid_shim
         driver->ops->attach_group()
             ioasid_shim_attach_group(ioasid_container, ...)
                 ioasid_attach_group()
                     // What used to be vfio_iommu_attach_group()

Broadly all the ops vfio need go through the ioasid_shim which relays
them to the generic ioasid API.

We end up with a ioasid.h that basically has the vfio_iommu_type1 code
lightly recast into some 'struct iommu_container' and a set of
ioasid_* function entry points that follow vfio_iommu_driver_ops_type1:
  ioasid_attach_group
  ioasid_detatch_group
  ioasid_<something about user pages>
  ioasid_read/ioasid_write

If we have this, and /dev/ioasid implements the legacy IOCTLs, then
/dev/vfio == /dev/ioasid and we can compile out vfio_fops and related
from vfio.c and tell ioasid.c to create /dev/vfio instead using the
ops it owns.

This is a very long winded way of saying ideally we'd do
approximately:
  git mv drivers/vfio/vfio_iommu_type1.c drivers/ioasid/ioasid.c

As the first step. Essentially we declare that what is type1 is really
the user interface to the internal kernel IOMMU kAPI, which has been
steadily evolving since type1 was created 10 years ago.

> The interface of making that selection might change to accept an
> external /dev/ioasid file descriptor, of course.  Maybe you can
> elaborate on how the vfio device and group uAPI live (or not) in
> this new scheme were /dev/ioasid is the primary interface.  Thanks,

They say in vfio. You'd still open a group and you'd still pass in
either /dev/vfio or /dev/ioasid to define the container

Though, completely as an unrelated aside, I admit to not entirely
understanding why the group is the central element of the uAPI.

It is weird that the vfio "drivers" all work on the struct vfio_device
(at least after my series), and it has a file_operations presence via
vfio_device_fops, but instead of struct vfio_device directly having a
'struct device' and cdev to access the FD we get it through a group FD
and agroup chardev via VFIO_GROUP_GET_DEVICE_FD

If we were to revise this, and I don't see a huge reason to do so, I
would put a struct device and cdev in struct vfio_device, attach the
vfio_device directly to the ioasid and then forget about the group, at
least as uapi, completely.

Or at least I don't see where that gets into trouble, but I'm not too
familiar with the multi-vfio in a process scenario..

Jason