On Tue, Jun 21, 2022 at 03:37:31PM +0800, Zhangfei Gao wrote: > > > On 2022/6/20 下午9:36, Greg Kroah-Hartman wrote: > > On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote: > > > On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote: > > > > > The refcount only ensures that the uacce_device object is not freed as > > > > > long as there are open fds. But uacce_remove() can run while there are > > > > > open fds, or fds in the process of being opened. And atfer uacce_remove() > > > > > runs, the uacce_device object still exists but is mostly unusable. For > > > > > example once the module is freed, uacce->ops is not valid anymore. But > > > > > currently uacce_fops_open() may dereference the ops in this case: > > > > > > > > > > uacce_fops_open() > > > > > if (!uacce->parent->driver) > > > > > /* Still valid, keep going */ > > > > > ... rmmod > > > > > uacce_remove() > > > > > ... free_module() > > > > > uacce->ops->get_queue() /* BUG */ > > > > uacce_remove should wait for uacce->queues_lock, until fops_open release the > > > > lock. > > > > If open happen just after the uacce_remove: unlock, uacce_bind_queue in open > > > > should fail. > > > Ah yes sorry, I lost sight of what this patch was adding. But we could > > > have the same issue with the patch, just in a different order, no? > > > > > > uacce_fops_open() > > > uacce = xa_load() > > > ... rmmod > > Um, how is rmmod called if the file descriptor is open? > > > > That should not be possible if the owner of the file descriptor is > > properly set. Please fix that up. > Thanks Greg > > Set cdev owner or use module_get/put can block rmmod once fops_open. > - uacce->cdev->owner = THIS_MODULE; > + uacce->cdev->owner = uacce->parent->driver->owner; > > However, still not find good method to block removing parent pci device. > > $ echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove & > > [ 32.563350] uacce_remove+0x6c/0x148 > [ 32.563353] hisi_qm_uninit+0x12c/0x178 > [ 32.563356] hisi_zip_remove+0xa0/0xd0 [hisi_zip] > [ 32.563361] pci_device_remove+0x44/0xd8 > [ 32.563364] device_remove+0x54/0x88 > [ 32.563367] device_release_driver_internal+0xec/0x1a0 > [ 32.563370] device_release_driver+0x20/0x30 > [ 32.563372] pci_stop_bus_device+0x8c/0xe0 > [ 32.563375] pci_stop_and_remove_bus_device_locked+0x28/0x60 > [ 32.563378] remove_store+0x9c/0xb0 > [ 32.563379] dev_attr_store+0x20/0x38 Removing the parent pci device does not remove the module code, it removes the device itself. Don't confuse code vs. data here. thanks, greg k-h