On Tue, Apr 11, 2023 at 11:11:17AM -0600, Alex Williamson wrote: > [Appears the list got dropped, replying to my previous message to re-add] Wowo this got mesed up alot, mutt drops the cc when replying for some reason. I think it is fixed up now > > Our cdev model says that opening a cdev locks out other cdevs from > > independent use, eg because of the group sharing. Extending this to > > include the reset group as well seems consistent. > > The DMA ownership model based on the IOMMU group is consistent with > legacy vfio, but now you're proposing a new ownership model that > optionally allows a user to extend their ownership, opportunistically > lock out other users, and wreaking havoc for management utilities that > also have no insight into dev_sets or userspace driver behavior. I suggested below that the owership require enough open devices - so it doesn't "extend ownership opportunistically", and there is no havoc. Management tools already need to understand dev_set if they want to offer reliable reset support to the VMs. Same as today. > > There is some security concern here, but that goes both ways, a 3rd > > party should not be able to break an application that needs to use > > this RESET and had sufficient privileges to assert an ownership. > > There are clearly scenarios we have now that could break. For example, > today if QEMU doesn't own all the IOMMU groups for a mult-function > device, it can't do a reset, the remaining functions are available for > other users. Sure, and we can keep that with this approach. > As I understand the proposal, QEMU now gets to attempt to > claim ownership of the dev_set, so it opportunistically extends its > ownership and may block other users from the affected devices. We can decide the policy for the kernel to accept a claim. I suggested below "same as today" - it must hold all the groups within the iommufd_ctx. The main point is to make this claiming operation qemu needs to do clearer and more explicit. I view this as better than trying to guess if it successfully made the claim by inspecting the _INFO output. > > I'd say anyone should be able to assert RESET ownership if, like > > today, the iommufd_ctx has all the groups of the dev_set inside > > it. Once asserted it becomes safe against all forms of hotplug, and > > continues to be safe even if some of the devices are closed. eg hot > > unplugging from the VM doesn't change the availability of RESET. > > > > This comes from your ask that qemu know clearly if RESET works, and it > > doesn't change while qemu is running. This seems stronger and clearer > > than the current implicit scheme. It also doesn't require usespace to > > do any calculations with groups or BDFs to figure out of RESET is > > available, kernel confirms it directly. > > As above, clarity and predictability seem lacking in this proposal. > With the current scheme, the ownership of the affected devices is > implied if they exist within an owned group, but the strength of that > ownership is clear. Same logic holds here Ownership is claimed same as today by having all groups representated in the iommufd_ctx. This seems just as clear as today. > > > seems this proposal essentially extends the ownership model to the > > > greater of the dev_set or iommu group, apparently neither of which > > > are explicitly exposed to the user in the cdev API. > > > > IIRC the group id can be learned from sysfs before opening the cdev > > file. Something like /sys/class/vfio/XX/../../iommu_group > > And in the passed cdev fd model... ? IMHO we should try to avoid needing to expose group_id specifically to userspace. We are missing a way to learn the "same ioas" restriction in iommufd, and it should provide that directly based on dev_ids. Otherwise if we really really need group_id then iommufd should provide an ioctl to get it. Let's find a good reason first > > We should also have an iommufd ioctl to report the "same ioas" > > groupings of dev_ids to make it easy on userspace. I haven't checked > > to see what the current qemu patches are doing with this.. > > Seems we're ignoring that no-iommu doesn't have a valid iommufd. no-iommu doesn't and shouldn't have iommu_groups either. It also doesn't have an IOAS so querying for same-IOAS is not necessary. The simplest option for no-iommu is to require it to pass in every device fd to the reset ioctl. > > > How does a user determine when devices cannot be used independently > > > in the cdev API? > > > > We have this problem right now. The only way to learn the reset group > > is to call the _INFO ioctl. We could add a sysfs "pci_reset_group" > > under /sys/class/vfio/XX/ if something needs it earlier. > > For all the complaints about complexity, now we're asking management > tools to not only take into account IOMMU groups, but also reset > groups, and some inferred knowledge about the application and devices > to speculate whether reset group ownership is taken by a given > userspace?? No, we are trying to keep things pretty much the same as today without resorting to exposing a lot of group related concepts. The reset group is a clear concept that already exists and isn't exposed. If we really need to know about it then it should be exposed on its own, as a seperate discussion from this cdev stuff. I want to re-focus on the basics of what cdev is supposed to be doing, because several of the idea you suggested seem against this direction: - cdev does not have, and cannot rely on vfio_groups. We enforce this by compiling all the vfio_group infrastructure out. iommu_groups continue to exist. So converting a cdev to a vfio_group is not an allowed operation. - no-iommu should not have iommu_groups. We enforce this by compiling out all the no-iommu vfio_group infrastructure. - cdev APIs should ideally not require the user to know the group_id, we should try hard to design APIs to avoid this. We have solved every other problem but reset like this, I would like to get past reset without compromising the above. Jason