Re: [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and driver removal

Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> · Tue, 22 Jun 2021 18:51:13 +0200

On Tue, Jun 22, 2021 at 09:40:27AM -0700, Luis Chamberlain wrote:
> On Tue, Jun 22, 2021 at 06:27:52PM +0200, Greg KH wrote:
> > On Tue, Jun 22, 2021 at 08:27:13AM -0700, Luis Chamberlain wrote:
> > > On Tue, Jun 22, 2021 at 09:41:23AM +0200, Greg KH wrote:
> > > > On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote:
> > > > > +	ssize_t __ret; \
> > > > > +	if (!try_module_get(THIS_MODULE)) \
> > > > 
> > > > try_module_get(THIS_MODULE) is always racy and probably does not do what
> > > > you want it to do.  You always want to get/put module references from
> > > > code that is NOT the code calling these functions.
> > > 
> > > In this case, we want it to trump module removal if it succeeds. That's all.
> > 
> > True, but either you stop the race, or you do not right?  If you are so
> > invested in your load/unload test, this should show up with this code
> > eventually as well.
> 
> I still do not see how the race is possible give the goal to prevent
> module removal if a sysfs file is being used. If rmmod is taking
> place, this simply will bail out.
> 
> > > > > +		return -ENODEV; \
> > > > > +	__ret = _name ## _store(dev, attr, buf, len); \
> > > > > +	module_put(THIS_MODULE); \
> > > > 
> > > > This too is going to be racy.
> > > > 
> > > > While fun to poke at, I still think this is pointless.
> > > 
> > > If you have a better idea, which does not "DOS" module removal, please
> > > let me know!
> > 
> > I have yet to understand why you think that the load/unload in a loop is
> > a valid use case.
> 
> That is dependent upon the intrastructure tests built for a driver.
> 
> In the case of fstests and blktests we have drivers which *always* get
> removed and loaded on each test. Take for instance scsi_debug, which
> creates / destroys virtual devices on the per test. Likewise, to build
> confidence that failure rate is as close as possible to 0, one must run
> a test as many times as possible in a loop. And, to build confidence in
> a test, in some situations one ends up running modprobe / rmmod in a
> loop.
> 
> In this case a customer does have a complex system of tests, and by looking
> at the crash logs I managed to simplify the way to reproduce it using
> simple shell scripts.

And is _this_ change needed even with the changes in patch 1/3?

I think that commit fixes your issues given that you will not unload the
module until after the sysfs devices are removed from the system.  Have
you tried that alone with your test?

thanks,

greg k-h