On 12/01/2021 10:53, Guillaume Tucker wrote: > On 05/01/2021 09:13, Mike Rapoport wrote: >> On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote: >>> Hello Mike, >>> >>> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote: >>>> Thanks for the logs, it seems that implicitly adding reserved regions to >>>> memblock.memory wasn't that bright idea :) >>> >>> Would it be possible to somehow clean up the hack then? >>> >>> The only difference between the clean solution and the hack is that >>> the hack intended to achieved the exact same, but without adding the >>> reserved regions to memblock.memory. >> >> I didn't consider adding reserved regions to memblock.memory as a clean >> solution, this was still a hack, but I didn't think that things are that >> fragile. >> >> I still think we cannot rely on memblock.reserved to detect >> memory/zone/node sizes and the boot failure reported here confirms this. >> >>> The comment on that problematic area says the reserved area cannot be >>> used for DMA because of some unexplained hw issue, and that doing so >>> prevents booting, but since the area got reserved, even with the clean >>> solution, it shouldn't have never been used for DMA? >>> >>> So I can only imagine that the physical memory region is way more >>> problematic than just for DMA. It sounds like that anything that >>> touches it, including the CPU, will hang the system, not just DMA. It >>> sounds somewhat similar to the other e820 direct mapping issue on x86? >> >> My understanding is that the boot failed because when I implicitly added >> the reserved region to memblock.memory the memory size seen by >> free_area_init() jumped from 2G to 4G because the reserved area was close >> to 4G. The very first allocation would get a chunk from slightly below of >> 4G and as there is no real memory there, the kernel would crash. >> >>> If you want to test the hack on the arm board to check if it boots you >>> can use the below commit: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc >> >> My take is your solution would boot with this memory configuration, but I >> still don't think that using memblock.reserved for zone/node sizing is >> correct. > > The rk3288 platform has now been failing to boot for nearly a > month on linux-next: > > https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/ > > Until a fix or a new version of this patch is made, would it be > possible to drop it or revert it so the platform become usable > again? > > Or if you want, I can make a cleaned-up version of my hack to > ignore the problematic region if you still need your patch to be > on linux-next, but that would probably be less than ideal. By the way, another bisection found that this commit is also breaking tegra124-nyan-big but only with both CONFIG_EFI=y CONFIG_ARM_LPAE=y enabled: https://kernelci.org/test/case/id/5ff6b1e26cf19f3b10c94cc5/ The plain multi_v7_defconfig is booting fine: https://kernelci.org/test/plan/id/5ff6b0a1db91b8a2b9c94cba/ I haven't looked into this one or tried to make it boot like rk3288, but please let me know if there's anything there that can be done to help. Thanks, Guillaume