On Wed, Jan 17, 2024 at 02:17:03PM +0100, Christian Marangi wrote: > On Wed, Oct 26, 2022 at 10:19:21AM +0200, Linus Walleij wrote: > > On Tue, Oct 25, 2022 at 1:47 AM Christian Marangi <ansuelsmth@xxxxxxxxx> wrote: > > > > > bad news... yesterday I tested this binding and it's problematic. It > > > does work and the router correctly boot... > > > > That's actually partly good news :D > > Hi, > sorry for the necroposting but I got some time and wanted to fix and > bisect this for good since IPQ806x is finally in a better shape and is > actually modern enough. > > > > > > problem is that SMEM is > > > broken with such configuration... I assume with this binding, by the > > > system view ram starts from 0x42000000 instead of 0x40000000 and this > > > cause SMEM to fail probe with the error "SBL didn't init SMEM". > > > > We need to fix this. > > > > Totally but I think the problem is more deep... > > > > This is the location of SMEM entry in ram > > > > > > smem: smem@41000000 { > > > compatible = "qcom,smem"; > > > reg = <0x41000000 0x200000>; > > > no-map; > > > > > > hwlocks = <&sfpb_mutex 3>; > > > }; > > (...) > > > Wonder if you have other ideas about this. > > > > So the problem is that the resource is outside of the system RAM? > > > > I don't understand why that triggers it since this is per definition not > > system RAM, it is SMEM after all. And it is no different in esssence > > from any memory mapped IO or other things that are outside of > > the system RAM. > > > > The SMEM node is special since it is created without children thanks > > to the hack in drivers/of/platform.c. > > > > Then the driver in drivers/soc/qcom/smem.c > > contains things like this: > > > > rmem = of_reserved_mem_lookup(pdev->dev.of_node); > > if (rmem) { > > smem->regions[0].aux_base = rmem->base; > > smem->regions[0].size = rmem->size; > > } else { > > /* > > * Fall back to the memory-region reference, if we're not a > > * reserved-memory node. > > */ > > ret = qcom_smem_resolve_mem(smem, "memory-region", > > &smem->regions[0]); > > if (ret) > > return ret; > > } > > > > However it is treated as memory-mapped IO later: > > > > for (i = 1; i < num_regions; i++) { > > smem->regions[i].virt_base = devm_ioremap_wc(&pdev->dev, > > > > smem->regions[i].aux_base, > > > > smem->regions[i].size); > > if (!smem->regions[i].virt_base) { > > dev_err(&pdev->dev, "failed to remap %pa\n", > > &smem->regions[i].aux_base); > > return -ENOMEM; > > } > > } > > > > As a first hack I would check: > > > > 1. Is it the of_reserved_mem_lookup() or qcom_smem_resolve_smem() stuff > > in drivers/soc/qcom/smem.c that is failing? > > > > If yes then: > > > > 2. Add a fallback path just using of_iomap(node) for aux_base and size > > with some comment like /* smem is outside of the main memory map */ > > and see if that works. > > > > I think we got confused and we didn't read the code correctly. The > error is "SMEM is not initialized by SBL" that is triggered by... > > header = smem->regions[0].virt_base; > if (le32_to_cpu(header->initialized) != 1 || > le32_to_cpu(header->reserved)) { > dev_err(&pdev->dev, "SMEM is not initialized by SBL\n",); > return -EINVAL; > } > > I verified correctly that aux_base and size are the correct values > 0x41000000 and 0x200000. And from what I can see they get correctly > iomapped. > > Problem is that initialized and reserved have garbage in it. (not random > data tho but everytime the same data) > > My theory is that somehow the loader is still writing data there but I'm > a bit lost on how to verify that. (the fact that the data in those > values is always the same with the same compiled image makes me think > it's actually just loaded data) > > I also tested with disabling the CONFIG_ARM_ATAG_DTB_COMPAT flag but I > have the same result. > > What I'm using is this memory node > > memory@0 { > reg = <0x42000000 0x1e000000>; > device_type = "memory"; > }; > > And in chosed I have > > chosen { > bootargs = "earlycon"; > linux,usable-memory-range = <0x42000000 0x10000000>; > }; > > (the size is different just for the sake of it but it should not cause > problem right?) > > Maybe there is a way to make the SMEM reclaim those RAM space and reinit > it? (it's a workaround tho) > > Also with the current situation the kernel panics with... But I assume > this is caused by SMEM malfunctioning (the panic happen right after rpm > init when the RPM regulators are getting init. Looking at the affected > codes maybe it's failing at the "Free unused pages" stage? > > [ 1.912392] 8<--- cut here --- > [ 1.912431] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > [ 1.914356] [00000000] *pgd=00000000 > [ 1.922676] Internal error: Oops: 80000007 [#1] SMP ARM > [ 1.926158] Modules linked in: > [ 1.931103] CPU: 1 PID: 84 Comm: modprobe Not tainted 6.1.65 #0 > [ 1.934229] Hardware name: Generic DT based system > [ 1.940045] PC is at 0x0 > [ 1.944902] LR is at release_pages+0x114/0x36c > [ 1.947595] pc : [<00000000>] lr : [<c04298dc>] psr: 40000013 > [ 1.951851] sp : c27abe18 ip : c13cd5c1 fp : c27abe38 > [ 1.958012] r10: 0000009c r9 : c4018268 r8 : 00000005 > [ 1.963220] r7 : c243f400 r6 : c243f400 r5 : 00000098 r4 : df992b54 > [ 1.968431] r3 : 00000000 r2 : 00000000 r1 : 60000013 r0 : df992b54 > [ 1.975029] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > [ 1.981543] Control: 10c5787d Table: 4367806a DAC: 00000051 > [ 1.988744] Register r0 information: non-slab/vmalloc memory > [ 1.994472] Register r1 information: non-paged memory > [ 2.000200] Register r2 information: NULL pointer > [ 2.005148] Register r3 information: NULL pointer > [ 2.009834] Register r4 information: non-slab/vmalloc memory > [ 2.014525] Register r5 information: non-paged memory > [ 2.020252] Register r6 information: slab kmalloc-1k start c243f400 pointer offset 0 size 1024 > [ 2.025206] Register r7 information: slab kmalloc-1k start c243f400 pointer offset 0 size 1024 > [ 2.033714] Register r8 information: non-paged memory > [ 2.042301] Register r9 information: non-slab/vmalloc memory > [ 2.047424] Register r10 information: non-paged memory > [ 2.053152] Register r11 information: non-slab/vmalloc memory > [ 2.058100] Register r12 information: non-paged memory > [ 2.063915] Process modprobe (pid: 84, stack limit = 0x(ptrval)) > [ 2.068953] Stack: (0xc27abe18 to 0xc27ac000) > [ 2.075115] be00: 00000000 00000000 > [ 2.079378] be20: c147514c ffefffcf 00000000 00000000 0000009c 60000013 dfa12928 dfa12b44 > [ 2.087537] be40: c27abf24 0000009c c4018000 c401800c c27abf0c c27abf24 00000000 000000f8 > [ 2.095697] be60: 00000000 c045b248 ffffffff c27abf0c c35d1400 00000000 c35d1438 c045b4f8 > [ 2.103858] be80: c27abf0c 00002000 00000000 c044fb14 00000000 c0b6c2bc c35d1400 ffffffff > [ 2.112016] bea0: ffffffff c35a4c0c 00000000 ffffffff 00000000 00001c01 00000000 c3591510 > [ 2.120176] bec0: 00000000 c35d1400 ffffffff c3591510 00000000 c35d1400 00000000 c0458f30 > [ 2.128336] bee0: 00000000 c08f35c8 c36ebf00 c35d1400 00010000 00013fff c35a4c0c 00000000 > [ 2.136496] bf00: ffffffff 00000000 00000101 c35d1400 ffffffff ffffffff c2420501 00000001 > [ 2.144656] bf20: c4018000 c4018000 00000000 00000008 dfde733c dfde7360 dfde7384 dfde73a8 > [ 2.152815] bf40: dfa12a44 dfa12948 dfa129d8 dfa12ad4 c35d1400 00000000 c35d1438 00000698 > [ 2.160976] bf60: c27abf78 c0318a34 c35d1400 c2731000 c35d1438 c0320604 0000ff00 c258ea00 > [ 2.169136] bf80: c2731000 c2456f40 c03002c4 c2456f40 00000000 c0320e0c 000000f8 c0320e6c > [ 2.177294] bfa0: ffffffff c0300060 ffffffff bed38eb4 ffffffff bed38dcc 00000000 ffffffff > [ 2.185455] bfc0: ffffffff bed38eb4 00010f60 000000f8 6474e552 00000020 00000000 00000000 > [ 2.193614] bfe0: 6ffffff9 bed38e78 b6f91f1c b6fa4a44 60000010 ffffffff 00000000 00000000 > [ 2.201777] release_pages from tlb_batch_pages_flush+0x3c/0x70 > [ 2.209927] tlb_batch_pages_flush from tlb_finish_mmu+0x4c/0x130 > [ 2.215656] tlb_finish_mmu from exit_mmap+0xec/0x1e0 > [ 2.221903] exit_mmap from mmput+0x40/0x120 > [ 2.226939] mmput from do_exit+0x238/0x890 > [ 2.231279] do_exit from do_group_exit+0x34/0x84 > [ 2.235184] do_group_exit from __wake_up_parent+0x0/0x18 > [ 2.240053] Code: bad PC value > [ 2.245556] ---[ end trace 0000000000000000 ]--- > [ 2.248448] Kernel panic - not syncing: Fatal exception > [ 2.253158] CPU0: stopping > [ 2.253169] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D 6.1.65 #0 > [ 2.253180] Hardware name: Generic DT based system > [ 2.253189] unwind_backtrace from show_stack+0x10/0x14 > [ 2.253216] show_stack from dump_stack_lvl+0x40/0x4c > [ 2.253249] dump_stack_lvl from do_handle_IPI+0xf0/0x124 > [ 2.253276] do_handle_IPI from ipi_handler+0x18/0x20 > [ 2.253293] ipi_handler from handle_percpu_devid_irq+0x78/0x134 > [ 2.253313] handle_percpu_devid_irq from generic_handle_domain_irq+0x28/0x38 > [ 2.253338] generic_handle_domain_irq from gic_handle_irq+0x74/0x88 > [ 2.253361] gic_handle_irq from generic_handle_arch_irq+0x34/0x44 > [ 2.253391] generic_handle_arch_irq from call_with_stack+0x18/0x20 > [ 2.253419] call_with_stack from __irq_svc+0x80/0x98 > [ 2.253438] Exception stack(0xc1401f00 to 0xc1401f48) > [ 2.253451] 1f00: 00000005 00000000 00000a61 c03128a0 c1408640 00000000 c1404f68 c1404fa4 > [ 2.253461] 1f20: 00000000 c13c9c38 00000000 00000000 c14c1f00 c1401f50 c0307148 c030714c > [ 2.253467] 1f40: 60000013 ffffffff > [ 2.253474] __irq_svc from arch_cpu_idle+0x38/0x3c > [ 2.253500] arch_cpu_idle from default_idle_call+0x24/0x34 > [ 2.253526] default_idle_call from do_idle+0x1ec/0x240 > [ 2.253545] do_idle from cpu_startup_entry+0x28/0x2c > [ 2.253559] cpu_startup_entry from kernel_init+0x0/0x12c > [ 2.376160] Rebooting in 1 seconds.. > Some followup on this... I manage to enable DEBUG_LL and can have debug output from the decompressor... >From what I can see fdt_check_mem_start is not called at all... What I'm using with kernel config are: CONFIG_ARM_APPENDED_DTB=y CONFIG_ARM_ATAG_DTB_COMPAT=y And a downstream patch that mangle all the atags and takes only the cmdline one. The load and entry point is: 0x42208000 With the current setup I have this (I also added some debug log that print what is actually passed to do decompress DTB:0x42AED270 (0x00008BA7) Uncompressing Linux... 40208000 4220F10C done, booting the kernel. Where 40208000 is the value of output_start and 4220F10C is input_data. And I think this confirm that it's getting loaded in the wrong position actually in reserved memory... But how this is possible??? Hope can someone help me in this since I wasted the entire day with this and didn't manage to make any progress... aside from having fun with the head.S assembly code. -- Ansuel