On Mon, Apr 05, 2021 at 12:58:05PM -0700, Minchan Kim wrote: > On Mon, Apr 05, 2021 at 07:00:23PM +0000, Luis Chamberlain wrote: > > On Mon, Apr 05, 2021 at 10:07:24AM -0700, Minchan Kim wrote: > > > On Thu, Apr 01, 2021 at 11:59:25PM +0000, Luis Chamberlain wrote: > > > > And come to think of it the last patch I had sent with a new > > > > DECLARE_RWSEM(zram_unload) also has this same issue making most > > > > sysfs attributes rather fragile. > > > > > > Thanks for looking the way. I agree the single zram_index_rwlock is > > > not the right approach to fix it. However, I still hope we find more > > > generic solution to fix them at once since I see it's zram instance > > > racing problem. > > > > They are 3 separate different problems. Related, but different. > > What are 3 different problems? I am asking since I remember only two: > one for CPU multistate and the other one for sysfs during rmmod. The third one is the race to use sysfs attributes and those routines then derefernece th egendisk private_data. > > If the idea then is to busy out rmmod if a sysfs attribute is being > > read, that could then mean rmmod can sometimes never complete. Hogging > > up / busying out sysfs attributes means the module cannto be removed. > > It's true but is it a big problem? There are many cases that system > just return error if it's busy and rely on the admin. IMHO, rmmod should > be part of them. It depends on existing userspace scripts which are used to test and expectations set. Consider existing tests, you would know better, and since you are the maintainer you decide. I at least know for many other types of device drivers an rmmod is a sledge hammer. You decide. I just thought it would be good to highlight the effect now rather than us considering it later. > > Which is why the *try_module_get()* I think is much more suitable, as > > it will always fails if we're already going down. > > How does the try_module_get solved the problem? The try stuff only resolves the deadlock. The bget() / bdput() resolves the race to access to the gendisk private_data. > > > I see one of the problems is how I could make new zram object's > > > attribute group for zram knobs under /sys/block/zram0 since block > > > layer already made zram0 kobject via device_add_disk. > > > > Right.. well the syfs attribute races uncovered here actually do > > apply to any block driver as well. And which is why I was aiming > > for something generic if possible. > > It would be great but that's not the one we have atm so want to > proceed to fix anyway. What is not the one we have atm? I *do* have a proposed generic solution for 2/3 issues we have been disussing: a) deadlock on sysfs access b) gendisk private_data race But so far Greg does not see enough justification for a), so we can either show how wider this issue is (which I can do using coccinelle), or we just open code the try_module_get() / put on each driver that needs it for now. Either way it would resolve the issue. As for b), given that I think even you had missed my attempt to generialzie the bdget/bdput solution for any attribute type (did you see my dev_type_get() and dev_type_put() proposed changes?), I don't think this problem is yet well defined in a generic way for us to rule it out. It is however easier to simply open code this per driver that needs it for now given that I don't think Greg is yet convinced the deadlock is yet a widespread issue. I however am pretty sure both races races *do* exist outside of zram in many places. > > I am not sure if you missed the last hunks of the generic solution, > > but that would resolve the issue you noted. Here is the same approach > > but in a non-generic solution, specific to just one attribute so far > > and to zram: > > So idea is refcount of the block_device's inode Yes that itself prevents races against the gendisk private_data from being invalid. Why because a bdget() would not be successful after del_gendisk(): del_gendisk() --> invalidate_partition() --> __invalidate_device() --> invalidate_inodes() > and module_exit path > checks also the inode refcount to make rmmod failure? They try_module_get() approach resolves the deadlock race, but it does so in a lazy way. I mean lazy in that then rmmod wins over sysfs knobs. So touching sysfs knobs won't make an rmmod fail. I think that's more typical expected behaviour. Why? Because I find it odd that looping forever touching sysfs attributes should prevent a module removal. But that's a personal preference. Luis