On Tue, Jun 22, 2021 at 09:45:39AM +0200, Greg KH wrote: > On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote: > > When sysfs attributes use a lock also used on driver removal we can > > potentially deadlock. This happens when for instance a sysfs file on > > a driver is used, then at the same time we have driver removal trigger. > > The driver removal code holds a lock, and then the sysfs file entry waits > > for the same lock. While holding the lock the driver removal tries to > > remove the sysfs entries, but these cannot be removed yet as one is > > waiting for a lock. This won't complete as the lock is already held. > > Likewise module removal cannot complete, and so we deadlock. > > This is all about removing modules, not about "driver removal" from a > device. Please make the subject line here more explicit. Sure. > > To fix this we just *try* to get a refcount to the module when a shared > > lock is used, prior to mucking with a sysfs attribute. If this fails we > > just give up right away. > > > > We use a try method as a full lock means we'd then make our sysfs attributes > > busy us out from possible module removal, and so userspace could force denying > > module removal, a silly form of "DOS" against module removal. A try lock on > > the module removal ensures we give priority to module removal and interacting > > with sysfs attributes only comes second. Using a full lock could mean for > > instance that if you don't stop poking at sysfs files you cannot remove a > > module. > > > > This deadlock was first reported with the zram driver, a sketch of how > > this can happen follows: > > > > CPU A CPU B > > whatever_store() > > module_unload > > mutex_lock(foo) > > mutex_lock(foo) > > del_gendisk(zram->disk); > > device_del() > > device_remove_groups() > > Can you duplicate this in a real-world situation? > > What tools remove the zram module from the system on the fly? A customer did run into it through a series of automated tests. I was able to finally reproduce with the instructions given below. I simplified it given that the series of test the customer was running was much more complex. > > In this situation whatever_store() is waiting for the mutex foo to > > become unlocked, but that won't happen until module removal is complete. > > But module removal won't complete until the syfs file being poked completes > > which is waiting for a lock already held. > > > > This is a generic kernel issue with sysfs files which use any lock also > > used on module removal. Different generic solutions have been proposed. > > One approach proposed is by directly by augmenting attributes with module > > information [0]. This patch implements a solution by adding macros with > > the prefix MODULE_DEVICE_ATTR_*() which accomplish the same. Until we > > don't have a generic agreed upon solution for this shared between drivers, > > we must implement a fix for this on each driver. > > > > We make zram use the new MODULE_DEVICE_ATTR_*() helpers, and completely > > open code the solution for class attributes as there are only a few of > > those. > > > > This issue can be reproduced easily on the zram driver as follows: > > > > Loop 1 on one terminal: > > > > while true; > > do modprobe zram; > > modprobe -r zram; > > done > > > > Loop 2 on a second terminal: > > while true; do > > echo 1024 > /sys/block/zram0/disksize; > > echo 1 > /sys/block/zram0/reset; > > done > > As fun as this is, it's not a real workload, please do not pretend that > it is. Whoever said that it was? This is just a way to reproduce an issue which was reported. > And your code is still racy, see below. You just made the window even > smaller, which you still should be objecting to as you somehow feel this > is a valid usecase :) > > > @@ -2048,13 +2048,19 @@ static ssize_t hot_add_show(struct class *class, > > { > > int ret; > > > > + if (!try_module_get(THIS_MODULE)) > > + return -ENODEV; > > + > > You can not increment/decrement your own module's reference count and > expect it to work properly, as it is still a race. The goal here is to prevent an rmmod call if this succeeds. If it succeeds then any subsequent rmmod will fail. Can you explain how this is still racy? Luis