On Fri, Oct 15, 2021 at 07:52:04AM +0800, Ming Lei wrote: > On Thu, Oct 14, 2021 at 01:24:32PM -0700, Luis Chamberlain wrote: > > On Thu, Oct 14, 2021 at 10:11:46AM +0800, Ming Lei wrote: > > > On Thu, Oct 14, 2021 at 09:55:48AM +0800, Ming Lei wrote: > > > > On Mon, Sep 27, 2021 at 09:38:04AM -0700, Luis Chamberlain wrote: > > > > > > ... > > > > > > > > > > > Hello Luis, > > > > > > > > Can you test the following patch and see if the issue can be addressed? > > > > > > > > Please see the idea from the inline comment. > > > > > > > > Also zram_index_mutex isn't needed in zram disk's store() compared with > > > > your patch, then the deadlock issue you are addressing in this series can > > > > be avoided. > > > > > > > > > > > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > > > index fcaf2750f68f..3c17927d23a7 100644 > > > > --- a/drivers/block/zram/zram_drv.c > > > > +++ b/drivers/block/zram/zram_drv.c > > > > @@ -1985,11 +1985,17 @@ static int zram_remove(struct zram *zram) > > > > > > > > /* Make sure all the pending I/O are finished */ > > > > fsync_bdev(bdev); > > > > - zram_reset_device(zram); > > > > > > > > pr_info("Removed device: %s\n", zram->disk->disk_name); > > > > > > > > del_gendisk(zram->disk); > > > > + > > > > + /* > > > > + * reset device after gendisk is removed, so any change from sysfs > > > > + * store won't come in, then we can really reset device here > > > > + */ > > > > + zram_reset_device(zram); > > > > + > > > > blk_cleanup_disk(zram->disk); > > > > kfree(zram); > > > > return 0; > > > > @@ -2073,7 +2079,12 @@ static int zram_remove_cb(int id, void *ptr, void *data) > > > > static void destroy_devices(void) > > > > { > > > > class_unregister(&zram_control_class); > > > > + > > > > + /* hold the global lock so new device can't be added */ > > > > + mutex_lock(&zram_index_mutex); > > > > idr_for_each(&zram_index_idr, &zram_remove_cb, NULL); > > > > + mutex_unlock(&zram_index_mutex); > > > > + > > > > > > Actually zram_index_mutex isn't needed when calling zram_remove_cb() > > > since the zram-control sysfs interface has been removed, so userspace > > > can't add new device any more, then the issue is supposed to be fixed > > > by the following one line change, please test it: > > > > > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > > index fcaf2750f68f..96dd641de233 100644 > > > --- a/drivers/block/zram/zram_drv.c > > > +++ b/drivers/block/zram/zram_drv.c > > > @@ -1985,11 +1985,17 @@ static int zram_remove(struct zram *zram) > > > > > > /* Make sure all the pending I/O are finished */ > > > fsync_bdev(bdev); > > > - zram_reset_device(zram); > > > > > > pr_info("Removed device: %s\n", zram->disk->disk_name); > > > > > > del_gendisk(zram->disk); > > > + > > > + /* > > > + * reset device after gendisk is removed, so any change from sysfs > > > + * store won't come in, then we can really reset device here > > > + */ > > > + zram_reset_device(zram); > > > + > > > blk_cleanup_disk(zram->disk); > > > kfree(zram); > > > return 0; > > > > Sorry but nope, the cpu multistate issue is still present and we end up > > eventually with page faults. I tried with both patches. > > In theory disksize_store() can't come in after del_gendisk() returns, > then zram_reset_device() should cleanup everything, that is the issue > you described in commit log. > > We need to understand the exact reason why there is still cpuhp node > left, can you share us the exact steps for reproducing the issue? > Otherwise we may have to trace and narrow down the reason. See my commit log for my own fix for this issue. Luis