On Wed, Mar 10, 2021 at 01:11:15PM +0000, Luis Chamberlain wrote: > On Mon, Mar 08, 2021 at 06:55:30PM -0800, Minchan Kim wrote: > > On Sat, Mar 06, 2021 at 02:20:34AM +0000, Luis Chamberlain wrote: > > > The zram driver makes use of cpu hotplug multistate support, > > > whereby it associates a zram compression stream per CPU. To > > > support CPU hotplug multistate a callback enabled to allow > > > the driver to do what it needs when a CPU hotplugs. > > > > > > It is however currently possible to end up removing the > > > zram driver callback prior to removing the zram compression > > > streams per CPU. This would leave these compression streams > > > hanging. > > > > > > We need to fix ordering for driver load / removal, zram > > > device additions, in light of the driver's use of cpu > > > hotplug multistate. Since the zram driver exposes many > > > sysfs attribute which can also muck with the comrpession > > > streams this also means we can hit page faults today easily. > > > > Hi Luis, > > > > First of all, thanks for reporting the ancient bugs. > > > > Looks like you found several bugs and I am trying to digest them > > from your description to understand more clear. I need to ask > > stupid questions, first. > > > > If I understand correctly, bugs you found were related to module > > unloading race while the zram are still working. > > If so, couldn't we fix the bug like this(it's not a perfect > > patch but just wanted to show close module unloading race)? > > (I might miss other pieces here. Let me know. Thanks!) > > > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > index a711a2e2a794..646ae9e0b710 100644 > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -1696,6 +1696,8 @@ static void zram_reset_device(struct zram *zram) > > return; > > } > > > > + module_put(THIS_MODULE); > > + > > comp = zram->comp; > > disksize = zram->disksize; > > zram->disksize = 0; > > @@ -1744,13 +1746,19 @@ static ssize_t disksize_store(struct device *dev, > > goto out_free_meta; > > } > > > > + if (!try_module_get(THIS_MODULE)) > > + goto out_free_zcomp; > > + > > zram->comp = comp; > > zram->disksize = disksize; > > + > > set_capacity_and_notify(zram->disk, zram->disksize >> SECTOR_SHIFT); > > up_write(&zram->init_lock); > > > > return len; > > > > +out_free_zcomp: > > + zcomp_destroy(comp); > > out_free_meta: > > zram_meta_free(zram, disksize); > > out_unlock: > > This still allows for a crash on the driver by running the zram02.sh script twice. > > Mar 09 14:52:19 kdevops-blktests-block-dev kernel: zram0: detected capacity change from 209715200 to 0 > Mar 09 14:52:19 kdevops-blktests-block-dev kernel: BUG: unable to handle page fault for address: ffffba7db7904008 > Mar 09 14:52:19 kdevops-blktests-block-dev kernel: #PF: supervisor read access in kernel mode > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: #PF: error_code(0x0000) - not-present page > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: PGD 100000067 P4D 100000067 PUD 100311067 PMD 14cd2f067 PTE 0 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: Oops: 0000 [#1] SMP NOPTI > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: CPU: 0 PID: 2137 Comm: zram02.sh Tainted: G E 5.12.0-rc1-next-20210304+ #4 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RIP: 0010:zram_free_page+0x1b/0xf0 [zram] > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: Code: 1f 44 00 00 48 89 c8 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 41 54 49 89 f4 55 89 f5 53 48 8b 17 48 c1 e5 04 48 89 f> > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RSP:0018:ffffba7d808f3d88 EFLAGS: 00010286 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RAX: 0000000000000000 RBX: ffff9eee5317d200 RCX: 0000000000000000 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RDX: ffffba7db7904000 RSI: 0000000000000000 RDI: ffff9eee5317d200 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RBP: 0000000000000000 R08: 00000008f78bb1d3 R09: 0000000000000000 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000000 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: R13: ffff9eee53d4cb00 R14: ffff9eee5317d220 R15: ffff9eee70508b80 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: FS: 00007f4bb1482580(0000) GS:ffff9eef77c00000(0000) knlGS:0000000000000000 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: CR2: ffffba7db7904008 CR3: 0000000107e9c000 CR4: 0000000000350ef0 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: Call Trace: > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: zram_reset_device+0xe9/0x150 [zram] > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: reset_store+0x9a/0x100 [zram] > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: kernfs_fop_write_iter+0x124/0x1b0 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: new_sync_write+0x11c/0x1b0 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: vfs_write+0x1c2/0x260 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: ksys_write+0x5f/0xe0 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: do_syscall_64+0x33/0x80 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RIP: 0033:0x7f4bb13aaf33 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: Code: 8b 15 61 ef 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 > 00 00 85 c0 75 14 b8 01 00 0> > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RSP: 002b:00007ffce0090d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RAX: ffffffffffffffda RBX: 000055a17c4a14b0 RCX: 00007f4bb13aaf33 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RDX: 0000000000000002 RSI: 000055a17c4a14b0 RDI: 0000000000000001 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: RBP: 0000000000000002 R08: 000055a17c48a1d0 R09: 0000000000000000 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: R10: 000055a17c48a1d1 R11: 0000000000000246 R12: 0000000000000001 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: R13: 0000000000000002 R14: 7fffffffffffffff R15: 0000000000000000 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: Modules linked in: zram(E) zsmalloc(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) joydev(E) evdev(> > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: zram0: detected capacity change from 0 to 209715200 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: CR2: ffffba7db7904008 > Mar 09 14:52:20 kdevops-blktests-block-dev kernel: ---[ end trace 534ee1d0b880dd90 ]--- > > I can try to modify it to include second patch first, as that is > required. There are two separate bugs here. Or was your goalt to > try to address both with only one patch? I am trying to understand problem first.