Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 5 Apr 2023 16:21:09 -0300

On Wed, Apr 05, 2023 at 12:56:21PM -0600, Alex Williamson wrote:
> Usability needs to be a consideration as well.  An interface where the
> result is effectively arbitrary from a user perspective because the
> kernel is solely focused on whether the operation is allowed,
> evaluating constraints that the user is unaware of and cannot control,
> is unusable.

Considering this API is only invoked by qemu we might be overdoing
this usability and 'no shoot in foot' view.

> > This is a good point that qemu needs to make a policy decision if it
> > is happy about the VFIO configuration - but that is a policy decision
> > that should not become entangled with the kernel's security checks.
> > 
> > Today qemu can make this policy choice the same way it does right now
> > - call _INFO and check the group_ids. It gets the exact same outcome
> > as today. We already discussed that we need to expose the group ID
> > through an ioctl someplace.
> 
> QEMU can make a policy decision today because the kernel provides a
> sufficiently reliable interface, ie. based on the set of owned groups, a
> hot-reset is all but guaranteed to work.  

And we don't change that with cdev. If qemu wants to make the policy
decision it keeps using the exact same _INFO interface to make that
decision same it has always made.

We weaken the actual reset action to only consider the security side.

Applications that want this exclusive reset group policy simply must
check it on their own. It is a reasonable API design.

> > If this is too awkward we could add a query to the kernel if the cdev
> > is "reset exclusive" - eg the iommufd covers all the groups that span
> > the reset set.
> 
> That's essentially what we have if there are valid dev-ids for each
> affected device in the info ioctl.

If you have dev-ids for everything, yes. If you don't, then you can't
make the same policy choice using a dev-id interface.

> I don't think it helps the user experience to create loopholes where
> the hot-reset ioctl can still work in spite of those missing
> devices.

I disagree. The easy straightforward design is that the reset ioctl
works if the process has security permissions. Mixing a policy check
into the kernel on this path is creating complexity we don't really
need.

I don't view it as a loophole, it is flexability to use the API in a
way that is different from what qemu wants - eg an app like dpdk may
be willing to tolerate a reset group that becomes unavailable after
startup. Who knows, why should we force this in the kernel?

> For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> capability chains, we could add a capability that reports the group ID
> for the device.  

I was going to put that in an iommufd ioctl so it works with VDPA too,
but sure, lets assume we can get the group ID from a cdev fd.

> The hot-reset info ioctl remains as it is today, reporting group-ids
> and bdfs.

Sure, but userspace still needs to know how to map the reset sets into
dev-ids. Remember the reason we started doing this is because we don't
have easy access to the BDF anymore.

I like leaving this ioctl alone, lets go back to a dedicated ioctl to
return the dev_ids.

> The hot-reset ioctl itself is modified to transparently
> support either group fds or device fds.  The user can now map cdevs
> to group-ids and therefore follow the same rules as groups,
> providing at least one representative device fd for each group.

This looks like a very complex uapi compared to the empty list option,
but it seems like it would work.

Jason