On 1/18/23 4:03 AM, Tian, Kevin wrote: >> From: Alex Williamson >> Sent: Wednesday, January 18, 2023 5:23 AM >> >> On Fri, 13 Jan 2023 19:03:51 -0500 >> Matthew Rosato <mjrosato@xxxxxxxxxxxxx> wrote: >> >>> void vfio_device_group_close(struct vfio_device *device) >>> { >>> + void (*put_kvm)(struct kvm *kvm); >>> + struct kvm *kvm; >>> + >>> mutex_lock(&device->group->group_lock); >>> + kvm = device->kvm; >>> + put_kvm = device->put_kvm; >>> vfio_device_close(device, device->group->iommufd); >>> + if (kvm == device->kvm) >>> + kvm = NULL; >> >> Hmm, so we're using whether the device->kvm pointer gets cleared in >> last_close to detect whether we should put the kvm reference. That's a >> bit obscure. Our get and put is also asymmetric. >> >> Did we decide that we couldn't do this via a schedule_work() from the >> last_close function, ie. implementing our own version of an async put? >> It seems like that potentially has a cleaner implementation, symmetric >> call points, handling all the storing and clearing of kvm related >> pointers within the get/put wrappers, passing only a vfio_device to the >> put wrapper, using the "vfio_device_" prefix for both. Potentially >> we'd just want an unconditional flush outside of lock here for >> deterministic release. >> >> What's the downside? Thanks, >> > > btw I guess this can be also fixed by Yi's work here: > > https://lore.kernel.org/kvm/20230117134942.101112-6-yi.l.liu@xxxxxxxxx/ > > with set_kvm(NULL) moved to the release callback of kvm_vfio device, > such circular lock dependency can be avoided too. Oh, interesting... It seems to me that this would eliminate the reported call chain altogether: kvm_put_kvm -> kvm_destroy_vm -> kvm_destroy_devices -> kvm_vfio_destroy (starting here -- this would no longer be executed) -> kvm_vfio_file_set_kvm -> vfio_file_set_kvm -> group->group_lock/group_rwsem because kvm_destroy_devices now can't end up calling kvm_vfio_destroy and friends, it won't try and acquire the group lock a 2nd time making a kvm_put_kvm while the group lock is held OK to do. The vfio_file_set_kvm call will now always come from a separate thread of execution, kvm_vfio_group_add, kvm_vfio_group_del or the release thread: kvm_device_release (where the group->group_lock would not be held since vfio does not trigger closing of the kvm fd) -> kvm_vfio_destroy (or, kvm_vfio_release) -> kvm_vfio_file_set_kvm -> vfio_file_set_kvm -> group->group_lock/group_rwsem