This refactoring aims to address the ro_after_init failure path issue discussed in the thread [1] by Jinjie Ruan and Thomas Gleixner. [1] https://lore.kernel.org/all/20230915082126.4187913-1-ruanjinjie@xxxxxxxxxx/ The final patch incorporates Luis's suggestion to handle module unload by stopping the module and calling its exit function before releasing all allocated memory in the event of an ro_after_init failure. However, it ignores the errors. Please, see more below for discussion. Jinjie, can you test this patch and see if solves the issue? I've tested this with a system without any module other than the tested ones: loop and brd. For the block loop device driver I can successfully unload the module when I force a ro_after_init failure. However, I cannot do the same if I test this with the block ram disk (brd) driver. Please, see below the logs before/after the changes. Jens, could this be something in the brd driver? Additionaly, could we prevent any module from loading if it depends on another module that currently uses ro_after_init and is not fully loaded? This would help guarantee the success of list_empty() and try_stop_module() if ro_after_init fails. What do you think? The last patch currently ignores error from these 2 calls. * loop driver ro_after_init failure before this patch series: Nov 06 11:36:25 debian kernel: loop: module loaded Nov 06 11:36:25 debian kernel: BUG: unable to handle page fault for address: ffffffffa0006320 Nov 06 11:36:25 debian kernel: #PF: supervisor read access in kernel mode Nov 06 11:36:25 debian kernel: #PF: error_code(0x0000) - not-present page Nov 06 11:36:25 debian kernel: PGD 1e3f067 P4D 1e3f067 PUD 1e40063 PMD 10e7d4067 PTE 0 Nov 06 11:36:25 debian kernel: Oops: Oops: 0000 [#1] Nov 06 11:36:25 debian kernel: CPU: 0 UID: 0 PID: 428 Comm: (udev-worker) Not tainted 6.12.0-rc6-g4ade030a2d1b #155 Nov 06 11:36:25 debian kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Nov 06 11:36:25 debian kernel: RIP: 0010:bdev_open+0x83/0x290 Nov 06 11:36:25 debian kernel: Code: bb 48 01 00 00 48 89 3c 24 e8 79 24 38 00 48 8b 43 40 41 bd fa ff ff ff 48 83 b8 40 03 00 00 00 0f 84 b3 01 00 00 48 8b 43 48 <48> 8b 78 78 e8 d4 c9 c8 ff 84 c0 0f 84 9e 01 00 00 80 3d 45 ad ad Nov 06 11:36:25 debian kernel: RSP: 0018:ffff8881054dbc58 EFLAGS: 00010286 Nov 06 11:36:25 debian kernel: RAX: ffffffffa00062a8 RBX: ffff8881054a6800 RCX: ffff8881075becc0 Nov 06 11:36:25 debian kernel: RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff8881054a6948 Nov 06 11:36:25 debian kernel: RBP: 0000000000000009 R08: ffff88810e7aa9c0 R09: 0000000000000000 Nov 06 11:36:25 debian kernel: R10: ffff88810e5ab0c0 R11: ffff88810e796190 R12: ffff88810094e980 Nov 06 11:36:25 debian kernel: R13: 00000000fffffffa R14: 0000000000000000 R15: 0000000000000000 Nov 06 11:36:25 debian kernel: FS: 00007fd2ff110900(0000) GS:ffffffff81e47000(0000) knlGS:0000000000000000 Nov 06 11:36:25 debian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 06 11:36:25 debian kernel: CR2: ffffffffa0006320 CR3: 000000010e7ed004 CR4: 00000000003706b0 Nov 06 11:36:25 debian kernel: Call Trace: Nov 06 11:36:25 debian kernel: <TASK> Nov 06 11:36:25 debian kernel: ? __die_body+0x16/0x60 Nov 06 11:36:25 debian kernel: ? page_fault_oops+0x22a/0x310 Nov 06 11:36:25 debian kernel: ? exc_page_fault+0x99/0xa0 Nov 06 11:36:25 debian kernel: ? asm_exc_page_fault+0x22/0x30 Nov 06 11:36:25 debian kernel: ? bdev_open+0x83/0x290 Nov 06 11:36:25 debian kernel: ? bdev_open+0x67/0x290 Nov 06 11:36:25 debian kernel: ? iput+0x37/0x150 Nov 06 11:36:25 debian kernel: ? blkdev_open+0xab/0xd0 Nov 06 11:36:25 debian kernel: ? blkdev_mmap+0x60/0x60 Nov 06 11:36:25 debian kernel: ? do_dentry_open+0x25d/0x3b0 Nov 06 11:36:25 debian kernel: ? vfs_open+0x1e/0xc0 Nov 06 11:36:25 debian kernel: ? path_openat+0x9cf/0xbb0 Nov 06 11:36:25 debian kernel: ? do_filp_open+0x7f/0xd0 Nov 06 11:36:25 debian kernel: ? do_sys_openat2+0x67/0xb0 Nov 06 11:36:25 debian kernel: ? do_sys_open+0x4b/0x50 Nov 06 11:36:25 debian kernel: ? do_syscall_64+0x3d/0xb0 Nov 06 11:36:25 debian kernel: ? entry_SYSCALL_64_after_hwframe+0x4b/0x53 Nov 06 11:36:25 debian kernel: </TASK> Nov 06 11:36:25 debian kernel: Modules linked in: Nov 06 11:36:25 debian kernel: CR2: ffffffffa0006320 Nov 06 11:36:25 debian kernel: ---[ end trace 0000000000000000 ]--- ./scripts/faddr2line --list vmlinux bdev_open+0x83/0x290 bdev_open+0x83/0x290: bdev_open at block/bdev.c:908 903 904 mutex_lock(&disk->open_mutex); 905 ret = -ENXIO; 906 if (!disk_live(disk)) 907 goto abort_claiming; >908< if (!try_module_get(disk->fops->owner)) 909 goto abort_claiming; 910 ret = -EBUSY; 911 if (!bdev_may_open(bdev, mode)) 912 goto put_module; 913 if (bdev_is_partition(bdev)) * loop module drive load logs after the changes, including some debug info: diff --git a/kernel/module/main.c b/kernel/module/main.c index 9e549559af9e..7deffd73f04b 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2994,8 +2994,10 @@ static noinline int do_init_module(struct module *mod) return 0; fail_ro_after_init: - list_empty(&mod->source_list); - try_stop_module(mod, 0, &forced); + ret = list_empty(&mod->source_list); + pr_info("%s: list_empty: ret: %d\n", mod->name, ret); + ret = try_stop_module(mod, 0, &forced); + pr_info("%s: try_stop_module: ret: %d\n", mod->name, ret); # lsmod Module Size Used by # modprobe loop modprobe: ERROR: could not insert 'loop': Resource temporarily unavailable # lsmod Module Size Used by Kernel logs: [ 1376.901220] loop: module loaded [ 1376.901398] loop: list_empty: ret: 1 [ 1376.901618] loop: try_stop_module: ret: -11 Note: This is -EAGAIN error. * brd driver ro_after_init failure before/after this patch series: Nov 06 11:40:57 debian kernel: brd: module loaded Nov 06 11:40:57 debian kernel: BUG: unable to handle page fault for address: ffffffffa0004030 Nov 06 11:40:57 debian kernel: #PF: supervisor read access in kernel mode Nov 06 11:40:57 debian kernel: #PF: error_code(0x0000) - not-present page Nov 06 11:40:57 debian kernel: PGD 1e3f067 P4D 1e3f067 PUD 1e40063 PMD 10ee72067 PTE 0 Nov 06 11:40:57 debian kernel: Oops: Oops: 0000 [#1] Nov 06 11:40:57 debian kernel: CPU: 0 UID: 0 PID: 418 Comm: modprobe Not tainted 6.12.0-rc6-g4ade030a2d1b #155 Nov 06 11:40:57 debian kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Nov 06 11:40:57 debian kernel: RIP: 0010:blkdev_put_whole+0x19/0x30 Nov 06 11:40:57 debian kernel: Code: 70 ff ff ff 0f 0b e9 73 ff ff ff 66 0f 1f 44 00 00 53 48 89 fb ff 4f 40 75 08 48 89 df e8 2f 04 00 00 48 8b 7b 10 48 8b 47 48 <48> 8b 40 18 48 85 c0 74 03 5b ff e0 5b c3 66 0f 1f 84 00 00 00 00 Nov 06 11:40:57 debian kernel: RSP: 0018:ffff8881055afe80 EFLAGS: 00010246 Nov 06 11:40:57 debian kernel: RAX: ffffffffa0004018 RBX: ffff88810ee2d300 RCX: 0000000000000000 Nov 06 11:40:57 debian kernel: RDX: 0000000000000000 RSI: 00000003fffffffc RDI: ffff88810edc6c00 Nov 06 11:40:57 debian kernel: RBP: 00000000284a0005 R08: 0000000000000402 R09: 0000000000000d53 Nov 06 11:40:57 debian kernel: R10: 000000000000000f R11: 000000000000003f R12: ffff888103482780 Nov 06 11:40:57 debian kernel: R13: 0000000000000000 R14: ffff88810edc6c00 R15: ffff88810edc6d48 Nov 06 11:40:57 debian kernel: FS: 00007f62341f7640(0000) GS:ffffffff81e47000(0000) knlGS:0000000000000000 Nov 06 11:40:57 debian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 06 11:40:57 debian kernel: CR2: ffffffffa0004030 CR3: 00000001066e2004 CR4: 00000000003706b0 Nov 06 11:40:57 debian kernel: Call Trace: Nov 06 11:40:57 debian kernel: <TASK> Nov 06 11:40:57 debian kernel: ? __die_body+0x16/0x60 Nov 06 11:40:57 debian kernel: ? page_fault_oops+0x22a/0x310 Nov 06 11:40:57 debian kernel: ? exc_page_fault+0x99/0xa0 Nov 06 11:40:57 debian kernel: ? asm_exc_page_fault+0x22/0x30 Nov 06 11:40:57 debian kernel: ? blkdev_put_whole+0x19/0x30 Nov 06 11:40:57 debian kernel: ? bdev_release+0xb9/0xf0 Nov 06 11:40:57 debian kernel: ? blkdev_release+0x8/0x10 Nov 06 11:40:57 debian kernel: ? __fput+0x77/0x280 Nov 06 11:40:57 debian kernel: ? task_work_run+0x58/0x70 Nov 06 11:40:57 debian kernel: ? syscall_exit_to_user_mode+0x66/0xa0 Nov 06 11:40:57 debian kernel: ? do_syscall_64+0x49/0xb0 Nov 06 11:40:57 debian kernel: ? entry_SYSCALL_64_after_hwframe+0x4b/0x53 Nov 06 11:40:57 debian kernel: </TASK> Nov 06 11:40:57 debian kernel: Modules linked in: Nov 06 11:40:57 debian kernel: CR2: ffffffffa0004030 Nov 06 11:40:57 debian kernel: ---[ end trace 0000000000000000 ]--- ./scripts/faddr2line --list vmlinux blkdev_put_whole+0x19/0x30 blkdev_put_whole+0x19/0x30: blkdev_put_whole at block/bdev.c:679 674 675 static void blkdev_put_whole(struct block_device *bdev) 676 { 677 if (atomic_dec_and_test(&bdev->bd_openers)) 678 blkdev_flush_mapping(bdev); >679< if (bdev->bd_disk->fops->release) 680 bdev->bd_disk->fops->release(bdev->bd_disk); 681 } 682 683 static int blkdev_get_whole(struct block_device *bdev, blk_mode_t mode) 684 { To: Luis Chamberlain <mcgrof@xxxxxxxxxx> To: Petr Pavlu <petr.pavlu@xxxxxxxx> To: Sami Tolvanen <samitolvanen@xxxxxxxxxx> To: Christophe Leroy <christophe.leroy@xxxxxxxxxx> To: Thomas Gleixner <tglx@xxxxxxxxxxxxx> To: Jinjie Ruan <ruanjinjie@xxxxxxxxxx> To: Jens Axboe <axboe@xxxxxxxxx> Cc: Daniel Gomez (Samsung) <d+samsung@xxxxxxxxxx> Cc: linux-modules@xxxxxxxxxxxxxxx Cc: linux-kernel@xxxxxxxxxxxxxxx Signed-off-by: Daniel Gomez <da.gomez@xxxxxxxxxxx> --- Daniel Gomez (4): module: remove module_deallocate() unused param module: avoid double ref drop on ro_after_init failure module: update state and release reference under lock module: refactor ro_after_init failure path kernel/module/main.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) --- base-commit: c39901c21cda246f967a4d1dc79ae349488fc4de change-id: 20241108-modules-ro_after_init-0f16f0b4daa2 Best regards, -- Daniel Gomez <da.gomez@xxxxxxxxxxx>