On Fri, Sep 17, 2021 at 10:05 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: > > When sysfs attributes use a lock also used on module removal we can > race to deadlock. This happens when for instance a sysfs file on > a driver is used, then at the same time we have module removal call > trigger. The module removal call code holds a lock, and then the sysfs > file entry waits for the same lock. While holding the lock the module > removal tries to remove the sysfs entries, but these cannot be removed > yet as one is waiting for a lock. This won't complete as the lock is > already held. Likewise module removal cannot complete, and so we deadlock. > > This can now be easily reproducible with our sysfs selftest as follows: > > ./tools/testing/selftests/sysfs/sysfs.sh -t 0027 > > To fix this we extend the struct kernfs_node with a module reference and > use the try_module_get() after kernfs_get_active() is called which > protects integrity and the existence of the kernfs node during the > operation. > > So long as the kernfs node is protected with kernfs_get_active() we know > we can rely on its contents. And, as now just documented in the previous > patch, we also now know that once kernfs_get_active() is called the module > is also guarded to exist and cannot be removed. > > If try_module_get() fails we fail the operation on the kernfs node. > > We use a try method as a full lock means we'd then make our sysfs > attributes busy us out from possible module removal, and so userspace > could force denying module removal, a silly form of "DOS" against module > removal. A try lock on the module removal ensures we give priority to > module removal and interacting with sysfs attributes only comes second. > Using a full lock could mean for instance that if you don't stop poking > at sysfs files you cannot remove a module. > > Races between removal of sysfs files and the module are not possible > given sysfs files are created by the same module, and when a sysfs file > is being used kernfs prevents removal of the sysfs file. So if module > removal is actually happening the removal would have to wait until > the sysfs file operation is complete. > > This deadlock was first reported with the zram driver, however the live > patching folks have acknowledged they have observed this as well with > live patching, when a live patch is removed. I was then able to > reproduce easily by creating a dedicated selftests. > > A sketch of how this can happen follows: > > CPU A CPU B > whatever_store() > module_unload > mutex_lock(foo) > mutex_lock(foo) > del_gendisk(zram->disk); > device_del() > device_remove_groups() This flow seems possible to trigger with: echo $dev > /sys/bus/$bus/drivers/$driver/unbind I am missing why module pinning is part of the solution when it's the device_del() path that is racing? Module removal is just a more coarse grained way to trigger unbind => device_del(). Isn't the above a bug in the driver, not missing synchronization in kernfs? Forgive me if the unbind question was asked and answered elsewhere, this is my first time taking a look at this series.