> From: Tian, Kevin > Sent: Wednesday, September 22, 2021 9:07 AM > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Wednesday, September 22, 2021 8:55 AM > > > > On Tue, Sep 21, 2021 at 11:56:06PM +0000, Tian, Kevin wrote: > > > > The opened atomic is aweful. A newly created fd should start in a > > > > state where it has a disabled fops > > > > > > > > The only thing the disabled fops can do is register the device to the > > > > iommu fd. When successfully registered the device gets the normal fops. > > > > > > > > The registration steps should be done under a normal lock inside the > > > > vfio_device. If a vfio_device is already registered then further > > > > registration should fail. > > > > > > > > Getting the device fd via the group fd triggers the same sequence as > > > > above. > > > > > > > > > > Above works if the group interface is also connected to iommufd, i.e. > > > making vfio type1 as a shim. In this case we can use the registration > > > status as the exclusive switch. But if we keep vfio type1 separate as > > > today, then a new atomic is still necessary. This all depends on how > > > we want to deal with vfio type1 and iommufd, and possibly what's > > > discussed here just adds another pound to the shim option... > > > > No, it works the same either way, the group FD path is identical to > > the normal FD path, it just triggers some of the state transitions > > automatically internally instead of requiring external ioctls. > > > > The device FDs starts disabled, an internal API binds it to the iommu > > via open coding with the group API, and then the rest of the APIs can > > be enabled. Same as today. > > After reading your comments on patch08, I may have a clearer picture on your suggestion. The key is to handle exclusive access at the binding time (based on vdev->iommu_dev). Please see whether below makes sense: Shared sequence: 1) initialize the device with a parked fops; 2) need binding (explicit or implicit) to move away from parked fops; 3) switch to normal fops after successful binding; 1) happens at device probe. for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD: - 2) is done by calling .bind_iommufd() callback; - 3) could be done within .bind_iommufd(), or via a new callback e.g. .finalize_device(). The latter may be preferred for the group interface; - Two threads may open the same device simultaneously, with exclusive access guaranteed by iommufd_bind_device(); - Open() after successful binding is rejected, since normal fops has been activated. This is checked upon vdev->iommu_dev; for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD: - 2) is done by open coding bind_iommufd + attach_ioas. Create an iommufd_device object and record it to vdev->iommu_dev - 3) is done by calling .finalize_device(); - open() after a valid vdev->iommu_dev is rejected. this also ensures exclusive ownership with the nongroup path. If Alex also agrees with it, this might be another mini-series to be merged (just for group path) before this one. Doing so sort of nullifies the existing group/container attaching process, where attach_ioas will be skipped and now the security context is established when the device is opened. Thanks Kevin