> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Tuesday, June 29, 2021 6:32 AM > > On Mon, 28 Jun 2021 01:09:18 +0000 > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > Sent: Friday, June 25, 2021 10:36 PM > > > > > > On Fri, Jun 25, 2021 at 10:27:18AM +0000, Tian, Kevin wrote: > > > > > > > - When receiving the binding call for the 1st device in a group, > iommu_fd > > > > calls iommu_group_set_block_dma(group, dev->driver) which does > > > > several things: > > > > > > The whole problem here is trying to match this new world where we want > > > devices to be in charge of their own IOMMU configuration and the old > > > world where groups are in charge. > > > > > > Inserting the group fd and then calling a device-centric > > > VFIO_GROUP_GET_DEVICE_FD_NEW doesn't solve this conflict, and isn't > > > necessary. > > > > No, this was not what I meant. There is no group fd required when > > calling this device-centric interface. I was actually talking about: > > > > iommu_group_set_block_dma(dev->group, dev->driver) > > > > just because current iommu layer API is group-centric. Whether this > > should be improved could be next-level thing. Sorry for not making > > it clear in the first place. > > > > > We can always get the group back from the device at any > > > point in the sequence do to a group wide operation. > > > > yes. > > > > > > > > What I saw as the appeal of the sort of idea was to just completely > > > leave all the difficult multi-device-group scenarios behind on the old > > > group centric API and then we don't have to deal with them at all, or > > > least not right away. > > > > yes, this is the staged approach that we discussed earlier. and > > the reason why I refined this proposal about multi-devices group > > here is because you want to see some confidence along this > > direction. Thus I expanded your idea and hope to achieve consensus > > with Alex/Joerg who obviously have not been convinced yet. > > > > > > > > I'd see some progression where iommu_fd only works with 1:1 groups at > > > the start. Other scenarios continue with the old API. > > > > One uAPI open after completing this new sketch. v1 proposed to > > conduct binding (VFIO_BIND_IOMMU_FD) after device_fd is acquired. > > With this sketch we need a new VFIO_GROUP_GET_DEVICE_FD_NEW > > to complete both in one step. I want to get Alex's confirmation whether > > it sounds good to him, since it's better to unify the uAPI between 1:1 > > group and 1:N group even if we don't support 1:N in the start. > > I don't like it. It doesn't make sense to me. You have the > group-centric world, which must continue to exist and cannot change > because we cannot break the vfio uapi. We can make extensions, we can > define a new parallel uapi, we can deprecate the uapi, but in the short > term, it can't change. > > AIUI, the new device-centric model starts with vfio device files that > can be opened directly. So what then is the purpose of a *GROUP* get > device fd? Why is a vfio uapi involved in setting a device cookie for > another subsystem? > > I'd expect that /dev/iommu will be used by multiple subsystems. All > will want to bind devices to address spaces, so shouldn't binding a > device to an iommufd be an ioctl on the iommufd, ie. > IOMMU_BIND_VFIO_DEVICE_FD. Maybe we don't even need "VFIO" in there > and > the iommufd code can figure it out internally. > > You're essentially trying to reduce vfio to the device interface. That > necessarily implies that ioctls on the container, group, or passed > through the container to the iommu no longer exist. From my > perspective, there should ideally be no new vfio ioctls. The user gets > a limited access vfio device fd and enables full access to the device > by registering it to the iommufd subsystem (100% this needs to be > enforced until close() to avoid revoke issues). The user interacts > exclusively with vfio via the device fd and performs all DMA address > space related operations through the iommufd. > > > > Then maybe groups where all devices use the same IOASID. > > > > > > Then 1:N groups if the source device is reliably identifiable, this > > > requires iommu subystem work to attach domains to sub-group objects - > > > not sure it is worthwhile. > > > > > > But at least we can talk about each step with well thought out patches > > > > > > The only thing that needs to be done to get the 1:1 step is to broadly > > > define how the other two cases will work so we don't get into trouble > > > and set some way to exclude the problematic cases from even getting to > > > iommu_fd in the first place. > > > > > > For instance if we go ahead and create /dev/vfio/device nodes we could > > > do this only if the group was 1:1, otherwise the group cdev has to be > > > used, along with its API. > > > > I feel for VFIO possibly we don't need significant change to its uAPI > > sequence, since it anyway needs to support existing semantics for > > backward compatibility. With this sketch we can keep vfio container/ > > group by introducing an external iommu type which implies a different > > GET_DEVICE_FD semantics. /dev/iommu can report a fd-wide capability > > for whether 1:N group is supported to vfio user. > > Ideally vfio would also at least be able to register a type1 IOMMU > backend through the existing uapi, backed by this iommu code, ie. we'd > create a new "iommufd" (but without the user visible fd), bind all the > group devices to it, generating our own device cookies, create a single > ioasid and attach all the devices to it (all internal). When using the > compatibility mode, userspace doesn't get device cookies, doesn't get > an iommufd, they do mappings through the container, where vfio owns the > cookies and ioasid. > > > For new subsystems they can directly create device nodes and rely on > > iommu fd to manage group isolation, without introducing any group > > semantics in its uAPI. > > Create device nodes, bind them to iommufd, associate cookies, attach > ioasids, etc. That should be the same for all subsystems, including > vfio, it's just the magic internal handshake between the device > subsystem and the iommufd subsystem that changes. > > > > > a) Check group viability. A group is viable only when all devices in > > > > the group are in one of below states: > > > > > > > > * driver-less > > > > * bound to a driver which is same as dev->driver (vfio in this > case) > > > > * bound to an otherwise allowed driver (same list as in vfio) > > > > > > This really shouldn't use hardwired driver checks. Attached drivers > > > should generically indicate to the iommu layer that they are safe for > > > iommu_fd usage by calling some function around probe() > > > > good idea. > > > > > > > > Thus a group must contain only iommu_fd safe drivers, or drivers-less > > > devices before any of it can be used. It is the more general > > > refactoring of what VFIO is doing. > > > > > > > c) The iommu layer also verifies group viability on BUS_NOTIFY_ > > > > BOUND_DRIVER event. BUG_ON if viability is broken while > > > block_dma > > > > is set. > > > > > > And with this concept of iommu_fd safety being first-class maybe we > > > can somehow eliminate this gross BUG_ON (and the 100's of lines of > > > code that are used to create it) by denying probe to non-iommu-safe > > > drivers, somehow. > > > > yes. > > > > > > > > > - Binding other devices in the group to iommu_fd just succeeds since > > > > the group is already in block_dma. > > > > > > I think the rest of this more or less describes the device centric > > > logic for multi-device groups we've already talked about. I don't > > > think it benifits from having the group fd > > > > > > > sure. All of this new sketch doesn't have group fd in any iommu fd > > API. Just try to elaborate a full sketch to sync the base. > > > > Alex/Joerg, look forward to your thoughts now. 😊 > > Some provided. Thanks, > Thanks a lot Alex! We'll try to focus on the new device-centric flow w/o touching existing container/group uAPI. As you said, we need a brand-new mechanism for all subsystems anyway. With that I will resume v2 progress based on device-centric concept. It will be still based on a few new VFIO uAPIs to handle device binding/ attaching, though you prefer to not adding any new VFIO uAPI. This is relatively a smaller open compared to device-centric vs. group-centric issue. We can have it continuously discussed in parallel with v2 review. and I hope v2 can be helpful for closing this open with a clearer explanation about PASID virtualization. 😊 Thanks Kevin