Re: [PATCH] util: basic support for vendor-specific vfio drivers

Jason Gunthorpe <jgg@xxxxxxxxxx> · Mon, 8 Aug 2022 08:31:28 -0300

On Fri, Aug 05, 2022 at 12:56:17PM -0600, Alex Williamson wrote:

> > This, and the above, could be handled by having libvirt also open the
> > group FD and get the device. It would prove both correct binding and
> > viability.
> 
> libvirt cannot do this in the group model because the group must be
> isolated in a container before the device can be accessed and libvirt
> cannot presume the QEMU container configuration.  

Hmm? it is a bunch of busy work, but libvirt can open an empty
container, do the VFIO_GROUP_GET_STATUS and then close everything
before invoking qemu.

But, looking at this, it seems we've messed something up in the kernel
implementation:

	case VFIO_GROUP_GET_STATUS:
		down_read(&group->group_rwsem);
		if (group->container)
			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
					VFIO_GROUP_FLAGS_VIABLE;
		else if (!iommu_group_dma_owner_claimed(group->iommu_group))
			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
		up_read(&group->group_rwsem);

If group->container is set then iommu_group_dma_owner_claimed() will
always be true, because group->container can only ever be set if
iommu_group_claim_dma_owner() succeeded.

So I wonder what was supposed to happen here..

> For direct device
> access, this certainly becomes a possibility and I've been trying to
> steer things in that direction, libvirt has the option to pass an fd for
> the iommufd and can then pass fds for each of the devices in the new
> uAPI paradigm.

So, what would be nice here? I assume libvirt would like to use these
tests even if the VM is currently running, so we want some kind of
sysfs file that returns "true if vfio can succeed
iommu_group_claim_dma_owner()" - which would be true if a vfio_device
FD is already open, otherwise would check
iommu_group_dma_owner_claimed() to look for conflicting drivers?

> The initial step is to then enlighten libvirt that other drivers can be
> compatible for the 'no' case and later we can make smarter choices
> about which driver to use or allow the user to specify (ie. a user
> should be able to use vfio-pci rather than a variant driver if they
> choose) in the 'yes' case.

I really don't like all this snooping around in kernel internals. It
is creating ABI where ABI should not exist. 

If libvirt is going to do this stuff I would be happier if it was
conditional on kernel-support to avoid it, and we don't have a libvirt
in the wild that is creating these ABIs. Yu had split out the
vfio_device sysfs part last week so we might see a series this week or
next.

> If libvirt is currently testing that only the target device is bound to
> vfio-pci, then maybe we do have gaps for the ancillary devices in the
> group, but that gap changes if instead we only test that a vfio group
> exists relative to the iommu group of the target device. 

Yes, but if we check the group then libvirt uses a kernel led solution
and doesn't create ABI - though it is differently imperfect than the
current solution..

Jason