On Thu, Apr 01, 2021 at 11:59:25PM +0000, Luis Chamberlain wrote: > On Mon, Mar 22, 2021 at 03:12:01PM -0700, Minchan Kim wrote: > > On Mon, Mar 22, 2021 at 08:41:56PM +0000, Luis Chamberlain wrote: > > > > > > I would not call it *every* syfs knob as not all deal with things which > > > are related to CPU hotplug multistate, right? Note that using just > > > try_module_get() alone (that is the second patch only, does not fix the > > > race I am describing above). > > > > It wouldn't be CPU hotplug multistate issue but I'd like to call it > > as more "zram instance race" bug. > > What happens in this case? > > > > CPU 1 CPU 2 > > > > destroy_devices > > .. > > compact_store() > > zram = dev_to_zram(dev); > > idr_for_each(zram_remove_cb > > zram_remove > > .. > > kfree(zram) > > down_read(&zram->init_lock); > > > > > > CPU 1 CPU 2 > > > > hot_remove_store > > compact_store() > > zram = dev_to_zram(dev); > > zram_remove > > kfree(zram) > > down_read(&zram->init_lock); > > > > So, for me we need to close the zram instance create/removal > > with sysfs rather than focusing on CPU hotplug issue. > > Sure, that's a good point. > > The issue which I noted for the race which ends up in a deadlock is only > possible if a shared lock is used on removal but also on sysfs knobs. > > At first glance, the issue you describe above *seems* to be just proper > care driver developers must take with structures used. It is certainly > yet another issue we need to address, and if we can generalize a > solution even better. I now recall I *think* I spotted that race a while > ago and mentioned it to Kees and David Howells but I didn't have a > solution for it yet. More on this below. > > The issue you point out is real, however you cannot disregard the > CPU hoplug possible race as well, it is a design consideration which > the CPU hotplug multistate support warns for -- consider driver removal. > I agree that perhaps solving this "zram instance race" can fix he > hotplug race as well. If we can solves all 3 issues in one shot even > better. But let's evaluate that prospect... > > > Maybe, we could reuse zram_index_mutex with modifying it with > > rw_semaphore. What do you think? > > Although ideal given it would knock 3 birds with 1 stone, it ends up > actually making the sysfs attributes rather useless in light of the > requirements for each of the races. Namely, the sysfs deadlock race > *must* use a try lock approach, just as the try_module_get() case. > It must use this approach so to immediately just bail out if we have > our module being removed, and so on our __exit path. By trying to > repurpose zram_index_mutex we end up then just doing too much with it > and making the syfs attributes rather fragile for most uses: > > Consider disksize_show(), that would have to become: > > static ssize_t disksize_show(struct device *dev, struct device_attribute *attr, char *buf) > { > struct zram *zram = dev_to_zram(dev); > + size_t disksize; > > + down_read(&zram_index_rwlock); > + disksize = zram->disksize; > + up_read(&zram_index_rwlock); > - return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize); > + return scnprintf(buf, PAGE_SIZE, "%llu\n", disksize); > } > > What's wrong with this? > > It can block during a write, yes, but there is a type of write which > will make this crash after the read lock is acquired. When the instance > is removed. What if we try down_read_trylock()? > > static ssize_t disksize_show(struct device *dev, struct device_attribute *attr, char *buf) > { > struct zram *zram = dev_to_zram(dev); > + size_t disksize; > > + if (!down_read_trylock(&zram_index_rwlock)) > + return -ENODEV; > + > + disksize = zram->disksize; > + up_read(&zram_index_rwlock); > - return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize); > + return scnprintf(buf, PAGE_SIZE, "%llu\n", disksize); > } > > What's wrong with this? > > If it got the lock, it should be OK as it is preventing writes from > taking the lock for a bit. But then this just becomes pretty fragile, > it will fail whenever another read or write is happening, triggering > perhaps quite a bit of regressions on tests. > > And if we use write_trylock() we end up with the same fragile nature > of failing the read with ENODEV for any silly thing going on with the > driver. > > And come to think of it the last patch I had sent with a new > DECLARE_RWSEM(zram_unload) also has this same issue making most > sysfs attributes rather fragile. Thanks for looking the way. I agree the single zram_index_rwlock is not the right approach to fix it. However, I still hope we find more generic solution to fix them at once since I see it's zram instance racing problem. A approach I am considering is to make struct zram include kobject and then make zram sysfs auto populated under the kobject. So, zram/sysfs lifetime should be under the kobject. With it, sysfs race probem I mentioned above should be gone. Furthermore, zram_remove should fail if one of the alive zram objects is existing (i.e., zram->kobject->refcount > 1) so module_exit will fail, too. I see one of the problems is how I could make new zram object's attribute group for zram knobs under /sys/block/zram0 since block layer already made zram0 kobject via device_add_disk.