Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. This is apparently is a regression and thus got on my radar. This all sounds a bit unfortunate. What can we do to get this properly solved? Which commit actually causes this? I wonder if poking maintainer higher in the hierarchy might help getting this finally fixed. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke On 02.04.23 15:11, Christian Hewitt wrote: >> On 2 Apr 2023, at 12:10 pm, Lucas Tanure <tanure@xxxxxxxxx> wrote: >> >> I am trying to fix a kernel panic I am seeing on my vim3 board (Amlogic A311D). >> I don't have enough knowledge about this area, but my current guess is >> the kernel is using a piece of memory belonging to ARM-trusted >> firmware that I shouldn't. >> Log: >> >> [ 9.792966] SError Interrupt on CPU3, code 0x00000000bf000000 -- SError >> [ 9.792980] CPU: 3 PID: 3471 Comm: kded5 Tainted: G C 6.2.0 #1 >> [ 9.792985] Hardware name: Khadas VIM3 (DT) >> [ 9.792987] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 9.792991] pc : kmem_cache_free_bulk.part.98+0x1f0/0x528 >> [ 9.793004] lr : kmem_cache_free_bulk.part.98+0x2f8/0x528 >> [ 9.793008] sp : ffff80000a2eb7f0 >> [ 9.793009] x29: ffff80000a2eb7f0 x28: ffff00001f358518 x27: ffff000000008800 >> [ 9.793016] x26: ffff00000262b300 x25: ffff00000262b300 x24: 0000000000000001 >> [ 9.793019] x23: ffff00000262b000 x22: 0000000000000000 x21: ffff00001f358538 >> [ 9.793022] x20: fffffc0000098ac0 x19: 0000000000000004 x18: 0000000000000040 >> [ 9.793025] x17: 0000000000000018 x16: 00000000000007f8 x15: 0000000000000003 >> [ 9.793028] x14: 0000000000000006 x13: ffff800008e48550 x12: 0000ffff9dc91fff >> [ 9.793031] x11: 0000000000000004 x10: 0000000000000001 x9 : ffff000007e93680 >> [ 9.793035] x8 : 0000000000000020 x7 : ffff000001d2b100 x6 : 0000000000000007 >> [ 9.793037] x5 : 0000000000000020 x4 : ffff000000008800 x3 : 0000000000000001 >> [ 9.793040] x2 : 0000000000000007 x1 : 0000000000000000 x0 : ffff00001f358540 >> [ 9.793045] Kernel panic - not syncing: Asynchronous SError Interrupt >> >> This doesn't happen with downstream Khadas 6.2 kernel, and that's >> because the downstream kernel removed this from >> early_init_dt_reserve_memory (drivers/of/fdt.c): >> >> /* >> * If the memory is already reserved (by another region), we >> * should not allow it to be marked nomap, but don't worry >> * if the region isn't memory as it won't be mapped. >> */ >> if (memblock_overlaps_region(&memblock.memory, base, size) && >> memblock_is_region_reserved(base, size)) >> return -EBUSY; >> >> >> And this causes 3 MiB of memory belonging to ARM Trusted firmware to >> be reserved. >> >> arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi : >> /* 3 MiB reserved for ARM Trusted Firmware (BL31) */ >> secmon_reserved: secmon@5000000 { >> reg = <0x0 0x05000000 0x0 0x300000>; >> no-map; >> }; >> >> And the mainline kernel fails to reserve that memory: >> [ 0.000000] OF: fdt: Reserved memory: failed to reserve memory for >> node 'secmon@5000000': base 0x0000000005000000, size 3 MiB >> >> It fails to reserve because memblock_overlaps_region and >> memblock_is_region_reserved return one. >> I think memblock_is_region_reserved is saying the memory is already >> reserved by uboot and shouldn't be nomap, but it should. >> >> Is there a bug here? >> Why the kernel is failing to reserve this memory? >> Is this an u-boot issue? >> >> I would appreciate any help. The current mainline kernel fails 90% of >> the time to boot into the Vim3 board. > > The issue was raised before by Stefan Agner here: > > https://lore.kernel.org/linux-arm-kernel/40ca11f84b7cdbfb9ad2ddd480cb204a@xxxxxxxx/ > > The thread sort of points at the general issue but the conversation > fizzled out and didn’t lead to any changes. At one point Stefan made > a suggestion about reverting part of the code, leading to this patch > in my own patchset: > > https://github.com/chewitt/linux/commit/9633c9b24f6f16afdb7fa8c2e163b6ea7a7ac5f8 > > The issue is still present and the patch does work around it. The > crashes would probably show up more, only a large percentage of > distros that actively support Amlogic boards (and several vendors) > are picking chunks of my curated LibreELEC patchset for their own > kernels and thus that patch is quite widely used. > > Christian >