On Tue, Apr 06, 2021 at 12:29:09AM +0000, Luis Chamberlain wrote: > On Mon, Apr 05, 2021 at 12:58:05PM -0700, Minchan Kim wrote: > > On Mon, Apr 05, 2021 at 07:00:23PM +0000, Luis Chamberlain wrote: > > > On Mon, Apr 05, 2021 at 10:07:24AM -0700, Minchan Kim wrote: > > > > On Thu, Apr 01, 2021 at 11:59:25PM +0000, Luis Chamberlain wrote: > > > > > And come to think of it the last patch I had sent with a new > > > > > DECLARE_RWSEM(zram_unload) also has this same issue making most > > > > > sysfs attributes rather fragile. > > > > > > > > Thanks for looking the way. I agree the single zram_index_rwlock is > > > > not the right approach to fix it. However, I still hope we find more > > > > generic solution to fix them at once since I see it's zram instance > > > > racing problem. > > > > > > They are 3 separate different problems. Related, but different. > > > > What are 3 different problems? I am asking since I remember only two: > > one for CPU multistate and the other one for sysfs during rmmod. > > The third one is the race to use sysfs attributes and those routines > then derefernece th egendisk private_data. First of all, thanks for keeping discussion, Luis. That was the one I thought race between sysfs and during rmmod. > > > > If the idea then is to busy out rmmod if a sysfs attribute is being > > > read, that could then mean rmmod can sometimes never complete. Hogging > > > up / busying out sysfs attributes means the module cannto be removed. > > > > It's true but is it a big problem? There are many cases that system > > just return error if it's busy and rely on the admin. IMHO, rmmod should > > be part of them. > > It depends on existing userspace scripts which are used to test and > expectations set. Consider existing tests, you would know better, and > since you are the maintainer you decide. > > I at least know for many other types of device drivers an rmmod is > a sledge hammer. > > You decide. I just thought it would be good to highlight the effect now > rather than us considering it later. To me, the rmmod faillure is not a big problem for zram since it's common cases in the system with -EBUSY(Having said, I agree that's the best if we could avoid the fail-and-retrial. IOW, -EBUSY should be last resort unless we have nicer way.) > > > > Which is why the *try_module_get()* I think is much more suitable, as > > > it will always fails if we're already going down. > > > > How does the try_module_get solved the problem? > > The try stuff only resolves the deadlock. The bget() / bdput() resolves > the race to access to the gendisk private_data. That's the one I missed in this discussion. Now I am reading your [2/2] in original patch. I thought it was just zram instance was destroyed by sysfs race problem so you had seen the deadlock. I might miss the point here, too. Hmm, we are discussing several problems all at once. I feel it's time to jump v2 with your way in this point. You said three different problems. As I asked, please write it down with more detail with code sequence as we discussed other thread. If you mean a deadlock, please write what specific locks was deadlock with it. It would make discussion much easier. Let's discuss the issue one by one in each thread. > > > > > I see one of the problems is how I could make new zram object's > > > > attribute group for zram knobs under /sys/block/zram0 since block > > > > layer already made zram0 kobject via device_add_disk. > > > > > > Right.. well the syfs attribute races uncovered here actually do > > > apply to any block driver as well. And which is why I was aiming > > > for something generic if possible. > > > > It would be great but that's not the one we have atm so want to > > proceed to fix anyway. > > What is not the one we have atm? I *do* have a proposed generic solution > for 2/3 issues we have been disussing: > > a) deadlock on sysfs access This is the one I didn't understand. > b) gendisk private_data race Yub. > > But so far Greg does not see enough justification for a), so we can either > show how wider this issue is (which I can do using coccinelle), or we > just open code the try_module_get() / put on each driver that needs it > for now. Either way it would resolve the issue. I second if it's general problem for drivers, I agree it's worth to addresss in the core unless the driver introduce the mess. I have no idea here since I didn't understand the problem, yet. > > As for b), given that I think even you had missed my attempt to > generialzie the bdget/bdput solution for any attribute type (did you see > my dev_type_get() and dev_type_put() proposed changes?), I don't think > this problem is yet well defined in a generic way for us to rule it out. > It is however easier to simply open code this per driver that needs it > for now given that I don't think Greg is yet convinced the deadlock is > yet a widespread issue. I however am pretty sure both races races *do* > exist outside of zram in many places. It would be good sign to propose general solution. > > > > I am not sure if you missed the last hunks of the generic solution, > > > but that would resolve the issue you noted. Here is the same approach > > > but in a non-generic solution, specific to just one attribute so far > > > and to zram: > > > > So idea is refcount of the block_device's inode > > Yes that itself prevents races against the gendisk private_data from > being invalid. Why because a bdget() would not be successful after > del_gendisk(): > > del_gendisk() --> invalidate_partition() --> __invalidate_device() --> invalidate_inodes() > > > and module_exit path > > checks also the inode refcount to make rmmod failure? > > They try_module_get() approach resolves the deadlock race, but it does > so in a lazy way. I mean lazy in that then rmmod wins over sysfs knobs. > So touching sysfs knobs won't make an rmmod fail. I think that's more > typical expected behaviour. Why? Because I find it odd that looping > forever touching sysfs attributes should prevent a module removal. But > that's a personal preference. I agree with you that would be better but let's see how it could go clean. FYI, please look at hot_remove_store which also can remove zram instance on demand. I am looking forward to seeing your v2. Thanks for your patience, Luis.