On Fri, Aug 05, 2022 at 12:56:17PM -0600, Alex Williamson wrote: > > This, and the above, could be handled by having libvirt also open the > > group FD and get the device. It would prove both correct binding and > > viability. > > libvirt cannot do this in the group model because the group must be > isolated in a container before the device can be accessed and libvirt > cannot presume the QEMU container configuration. Hmm? it is a bunch of busy work, but libvirt can open an empty container, do the VFIO_GROUP_GET_STATUS and then close everything before invoking qemu. But, looking at this, it seems we've messed something up in the kernel implementation: case VFIO_GROUP_GET_STATUS: down_read(&group->group_rwsem); if (group->container) status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET | VFIO_GROUP_FLAGS_VIABLE; else if (!iommu_group_dma_owner_claimed(group->iommu_group)) status.flags |= VFIO_GROUP_FLAGS_VIABLE; up_read(&group->group_rwsem); If group->container is set then iommu_group_dma_owner_claimed() will always be true, because group->container can only ever be set if iommu_group_claim_dma_owner() succeeded. So I wonder what was supposed to happen here.. > For direct device > access, this certainly becomes a possibility and I've been trying to > steer things in that direction, libvirt has the option to pass an fd for > the iommufd and can then pass fds for each of the devices in the new > uAPI paradigm. So, what would be nice here? I assume libvirt would like to use these tests even if the VM is currently running, so we want some kind of sysfs file that returns "true if vfio can succeed iommu_group_claim_dma_owner()" - which would be true if a vfio_device FD is already open, otherwise would check iommu_group_dma_owner_claimed() to look for conflicting drivers? > The initial step is to then enlighten libvirt that other drivers can be > compatible for the 'no' case and later we can make smarter choices > about which driver to use or allow the user to specify (ie. a user > should be able to use vfio-pci rather than a variant driver if they > choose) in the 'yes' case. I really don't like all this snooping around in kernel internals. It is creating ABI where ABI should not exist. If libvirt is going to do this stuff I would be happier if it was conditional on kernel-support to avoid it, and we don't have a libvirt in the wild that is creating these ABIs. Yu had split out the vfio_device sysfs part last week so we might see a series this week or next. > If libvirt is currently testing that only the target device is bound to > vfio-pci, then maybe we do have gaps for the ancillary devices in the > group, but that gap changes if instead we only test that a vfio group > exists relative to the iommu group of the target device. Yes, but if we check the group then libvirt uses a kernel led solution and doesn't create ABI - though it is differently imperfect than the current solution.. Jason