On 8/30/24 20:18, Christoph Biedl wrote:
matoro wrote...
Hi all, just bumped to the newest mainline starting with 6.10.2 and
immediately ran into a crash on boot. Fully reproducible, reverting back to
last known good (6.9.8) resolves the issue. Any clue what's going on here?
I can provide full boot logs, start bisecting, etc if needed...
Is this supposed to have been fixed in the meantime? Using 6.10.7 from yesterday,
I getting a similar crash:
(...)
[ 9.653898] scsi 1:0:5:0: Power-on or device reset occurred
[ 12.337213] sd 1:0:5:0: Attached scsi generic sg0 type 0
[ 12.343544] sd 1:0:5:0: [sda] 17773524 512-byte logical blocks: (9.10 GB/8.47 GiB)
[ 12.352957] sd 1:0:5:0: [sda] Write Protect is off
[ 12.359151] sd 1:0:5:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 12.379035] sda: sda1 sda2 sda3
[ 12.383562] sd 1:0:5:0: [sda] Attached SCSI disk
[ 12.397737] Freeing unused kernel image (initmem) memory: 3072K
[ 12.406839] Backtrace:
[ 12.409235] [<1116535c>] kernel_init+0x80/0x1d4
[ 12.413911] [<1040201c>] ret_from_kernel_thread+0x1c/0x24
[ 12.419448]
[ 12.420970]
[ 12.422487] Kernel Fault: Code=26 (Data memory access rights trap) at addr 113c5f90
[ 12.430172] CPU: 0 PID: 1 Comm: swapper Not tainted 6.10.7 #1
[ 12.435958] Hardware name: 9000/785/C3600
[ 12.439997]
[ 12.441518] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[ 12.446256] PSW: 00000000000001000000000000001111 Not tainted
[ 12.452033] r00-03 0004000f 113c9744 105994ac 128942c0
[ 12.457295] r04-07 119bda70 1180d4e0 1180d4e0 11822a90
[ 12.462555] r08-11 11822a70 112d1000 1180d7ec 00000017
[ 12.467817] r12-15 00000000 11a1aa70 113b196c 112d1000
[ 12.473077] r16-19 112d1000 ffffffff f0000174 113c723c
[ 12.478338] r20-23 00000002 113c9744 113c5a70 000000d0
[ 12.483597] r24-27 12892d00 00000000 119bde74 113c5a70
[ 12.488859] r28-31 113c5f8c 01a19700 12894300 00000004
[ 12.494158] sr00-03 00000000 00000000 00000000 00000000
[ 12.499502] sr04-07 00000000 00000000 00000000 00000000
[ 12.504850]
[ 12.506373] IASQ: 00000000 00000000 IAOQ: 10599508 1059950c
[ 12.511980] IIR: 0f941288 ISR: 00000000 IOR: 113c5f90
[ 12.517495] CPU: 0 CR30: 12892d00 CR31: 11111111
[ 12.523016] ORIG_R28: 55555555
[ 12.526185] IAOQ[0]: jump_label_init_ro+0x98/0xe4
[ 12.531014] IAOQ[1]: jump_label_init_ro+0x9c/0xe4
[ 12.535872] RP(r2): jump_label_init_ro+0x3c/0xe4
[ 12.540610] Backtrace:
[ 12.543000] [<1116535c>] kernel_init+0x80/0x1d4
[ 12.547654] [<1040201c>] ret_from_kernel_thread+0x1c/0x24
[ 12.553319]
[ 12.557345] Kernel panic - not syncing: Kernel Fault
.config is attached, I can dig more in the next days.
I can reproduce.
The crash happens, because in kernel/jump_label.c: jump_label_init_ro(),
this static key is accessed but gives a segfault, because this area is already read-only:
mm/usercopy.c:static DEFINE_STATIC_KEY_FALSE_RO(bypass_usercopy_checks);
This is the only static key in this parisc kernel which is marked with __ro_after_init.
The area is marked read-only in free_initmem() [in arch/parisc/mm/init.c],
which happens before mark_readonly().
So, the issue is basically triggered by this commit:
commit 91a1d97ef482c1e4c9d4c1c656a53b0f6b16d0ed
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Wed Mar 13 19:01:03 2024 +0100
jump_label,module: Don't alloc static_key_mod for __ro_after_init keys
due to this hunk:
diff --git a/init/main.c b/init/main.c
index 2ca52474d0c3..6c3f251d6ef8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1408,6 +1408,7 @@ static void mark_readonly(void)
* insecure pages which are W+X.
*/
flush_module_init_free_work();
+ jump_label_init_ro();
mark_rodata_ro();
debug_checkwx();
rodata_test();
I'm still unsure about the best way to fix it.
Swapping calls to free_initmem() and mark_readonly() fixes it for me:
diff --git a/init/main.c b/init/main.c
index 206acdde51f5..1f82583fd21d 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1473,8 +1473,8 @@ static int __ref kernel_init(void *unused)
ftrace_free_init_mem();
kgdb_free_init_mem();
exit_boot_config();
- free_initmem();
mark_readonly();
+ free_initmem();
/*
* Kernel mappings are now finalized - update the userspace page-table
Opinions?
Helge