Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Tue, 6 Apr 2021 00:29:09 +0000

On Mon, Apr 05, 2021 at 12:58:05PM -0700, Minchan Kim wrote:
> On Mon, Apr 05, 2021 at 07:00:23PM +0000, Luis Chamberlain wrote:
> > On Mon, Apr 05, 2021 at 10:07:24AM -0700, Minchan Kim wrote:
> > > On Thu, Apr 01, 2021 at 11:59:25PM +0000, Luis Chamberlain wrote:
> > > > And come to think of it the last patch I had sent with a new
> > > > DECLARE_RWSEM(zram_unload) also has this same issue making most
> > > > sysfs attributes rather fragile.
> > > 
> > > Thanks for looking the way. I agree the single zram_index_rwlock is
> > > not the right approach to fix it. However, I still hope we find more
> > > generic solution to fix them at once since I see it's zram instance
> > > racing problem.
> > 
> > They are 3 separate different problems. Related, but different.
> 
> What are 3 different problems? I am asking since I remember only two:
> one for CPU multistate and the other one for sysfs during rmmod.

The third one is the race to use sysfs attributes and those routines
then derefernece th egendisk private_data.

> > If the idea then is to busy out rmmod if a sysfs attribute is being
> > read, that could then mean rmmod can sometimes never complete. Hogging
> > up / busying out sysfs attributes means the module cannto be removed.
> 
> It's true but is it a big problem? There are many cases that system
> just return error if it's busy and rely on the admin. IMHO, rmmod should
> be part of them.

It depends on existing userspace scripts which are used to test and
expectations set. Consider existing tests, you would know better, and
since you are the maintainer you decide.

I at least know for many other types of device drivers an rmmod is
a sledge hammer.

You decide. I just thought it would be good to highlight the effect now
rather than us considering it later.

> > Which is why the *try_module_get()* I think is much more suitable, as
> > it will always fails if we're already going down.
> 
> How does the try_module_get solved the problem?

The try stuff only resolves the deadlock. The bget() / bdput() resolves
the race to access to the gendisk private_data.

> > > I see one of the problems is how I could make new zram object's
> > > attribute group for zram knobs under /sys/block/zram0 since block
> > > layer already made zram0 kobject via device_add_disk.
> > 
> > Right.. well the syfs attribute races uncovered here actually do
> > apply to any block driver as well. And which is why I was aiming
> > for something generic if possible.
> 
> It would be great but that's not the one we have atm so want to
> proceed to fix anyway.

What is not the one we have atm? I *do* have a proposed generic solution
for 2/3 issues we have been disussing:

 a) deadlock on sysfs access
 b) gendisk private_data race

But so far Greg does not see enough justification for a), so we can either
show how wider this issue is (which I can do using coccinelle), or we
just open code the try_module_get() / put on each driver that needs it
for now. Either way it would resolve the issue.

As for b), given that I think even you had missed my attempt to
generialzie the bdget/bdput solution for any attribute type (did you see
my dev_type_get() and dev_type_put() proposed changes?), I don't think
this problem is yet well defined in a generic way for us to rule it out.
It is however easier to simply open code this per driver that needs it
for now given that I don't think Greg is yet convinced the deadlock is
yet a widespread issue. I however am pretty sure both races races *do*
exist outside of zram in many places.

> > I am not sure if you missed the last hunks of the generic solution,
> > but that would resolve the issue you noted. Here is the same approach
> > but in a non-generic solution, specific to just one attribute so far
> > and to zram:
> 
> So idea is refcount of the block_device's inode 

Yes that itself prevents races against the gendisk private_data from
being invalid. Why because a bdget() would not be successful after
del_gendisk():

del_gendisk() --> invalidate_partition() --> __invalidate_device() --> invalidate_inodes()

> and module_exit path
> checks also the inode refcount to make rmmod failure?

They try_module_get() approach resolves the deadlock race, but it does
so in a lazy way. I mean lazy in that then rmmod wins over sysfs knobs.
So touching sysfs knobs won't make an rmmod fail. I think that's more
typical expected behaviour. Why? Because I find it odd that looping
forever touching sysfs attributes should prevent a module removal. But
that's a personal preference.

  Luis