On Tue, Jun 22, 2021 at 08:05:12PM +0200, Greg KH wrote: > On Tue, Jun 22, 2021 at 10:27:12AM -0700, Luis Chamberlain wrote: > > On Tue, Jun 22, 2021 at 07:16:34PM +0200, Greg KH wrote: > > > On Tue, Jun 22, 2021 at 09:32:08AM -0700, Luis Chamberlain wrote: > > > > On Tue, Jun 22, 2021 at 09:45:39AM +0200, Greg KH wrote: > > > > > On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote: > > > > > > @@ -2048,13 +2048,19 @@ static ssize_t hot_add_show(struct class *class, > > > > > > { > > > > > > int ret; > > > > > > > > > > > > + if (!try_module_get(THIS_MODULE)) > > > > > > + return -ENODEV; > > > > > > + > > > > > > > > > > You can not increment/decrement your own module's reference count and > > > > > expect it to work properly, as it is still a race. > > > > > > > > The goal here is to prevent an rmmod call if this succeeds. If it > > > > succeeds then any subsequent rmmod will fail. Can you explain how > > > > this is still racy? > > > > > > {sigh} > > > > > > What happens if the driver core is just about to call hot_add_show() and > > > the module is removed from the system. It then calls to the memory > > > location that hot_add_show() was previously at, but now that is not a > > > valid pointer to code, and boom. > > > > The new kobject_get() on patch 3/3 ensures that the device will be up > > throughout the entire life of the store call, and thus prevent the > > code being executed being removed, no? > > I do not know, I no longer remember what is in that patch at the moment > as it is long-gone from my queue. It was the changes *you* recommended, a generic way to ensure the lifetime of the derefernce is valid. I had used bdgrab()/bdget() and you suggested we generalize it with the kobject_get() for the device and a bus get. With that change, I confirm that the device will still be present during the lifetime of the sysfs knobs call. > Also, if the device will be "up" for the whole lifetime, why do you need > to increment the module reference count? The goal is to prevent a deadlock. The lifetime of the device is not an issue in this deadlock case, the issue is a race with module removal and that code path using a lock which is also used on a sysfs knob. Luis