On Wed, 29 Mar 2023 09:41:26 +0000 "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > > Sent: Wednesday, March 29, 2023 11:14 AM > > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > Sent: Wednesday, March 29, 2023 12:00 AM > > > > > > > > > Personally I don't like the suggestion to fail with -EPERM if the user > > > doesn't own all the affected devices. This isn't a "probe if I can do > > > a reset" ioctl, it's a "provide information about the devices affected > > > by a reset to know how to call the hot-reset ioctl". We're returning > > > the bdf to the cdev version of this ioctl for exactly this debugging > > > purpose when the devices are not owned, that becomes useless if we give > > > up an return -EPERM if ownership doesn't align. > > > > Jason's suggestion makes sense for returning the case of returning dev_id > > as dev_id is local to iommufd. If there are devices in the same dev_set are > > opened by multiple users, multiple iommufd would be used. Then the > > dev_id would have overlap. e.g. a dev_set has three devices. Device A and > > B are opened by the current user as cdev, dev_id #1 and #2 are generated. > > While device C opened by another user as cdev, dev_id #n is generated for > > it. If dev_id #n happens to be #1, then user gets two info entries that have > > the same dev_id. > > > > In Alex's proposal you'll set a invalid dev_id for device C so the user can > still get the info for diagnostic purpose instead of seeing an -EPERM error. Yes, we shouldn't be reporting dev_ids outside of the user's iommufd context. > btw I found an open about fd pass scheme which may affect the choice here. > > In concept even with cdev we still expect the userspace to maintain the > group knowledge so it won't inadvertently attempt to assign devices in > the same group to different IOAS's. It also needs such knowledge when > constructing guest topology. > > with fd passed in Qemu has no way to associate the fd to a group. Hmm, QEMU tries to get the group for the device address space in the guest, so finding an existing group with a different address space indeed allows QEMU to know of this conflict since the group is the fundamental unit IOMMU context in the legacy vfio model. > We could extend bind_iommufd to return the group id or introduce a > new ioctl to query it per dev_id. That would be ironic to go to all this trouble to remove groups from the API only to have them show up here. But with a cdev interface, don't we break that model of conflating isolation and address-ability? For example, devices within a group cannot be bound to separate iommufds due to lack of isolation, which is handled via DMA ownership, but barring DMA aliasing issues, due to conventional PCI buses or quirks, cdev could allow devices within the same group to be managed by separate IOAS's. So the group information really isn't enough for userspace to infer address space restrictions with cdev anyway. Therefore aren't we expecting this to be denied at attach_ioas() and QEMU shouldn't be making these sorts of assumptions for cdev anyway? Thanks, Alex