Hello Alexander On Tue, Nov 10, 2020 at 11:29:50AM +0100, Alexander Sverdlin wrote: > Hello Thomas, > > On 10/11/2020 10:55, Thomas Bogendoerfer wrote: > >>>> Linux doesn't own the memory immediately after the kernel image. On Octeon > >>>> bootloader places a shared structure right close after the kernel _end, > >>>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. > >>>> > >>>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc() > >>>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns > >>>> memory block overlapping with the above octeon_bootinfo structure, which > >>>> is being overwritten afterwards. > >>> as this special for Octeon how about added the memblock_reserve > >>> in octen specific code ? > >> while the shared structure which is being corrupted is indeed Octeon-specific, > >> the wrong assumption that the memory right after the kernel can be allocated by memblock > >> allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem(). > > ok, I see your point. IMHO this whole check_kernel_sections_mem() should > > be removed. IMHO memory adding should only be done my memory detection code. > > > > Could you send a patch, which removes check_kernel_section_mem completly ? > > this will expose one issue: > platforms usually do it in a sane way, like it was done last 15 years, namely > add kernel image without non-complete pages on the boundaries. > This will lead to the situation, that request_resource() will fail at least > for .bss section of the kernel and it will not be properly displayed under > /proc/iomem (and probably same problem will appear, which initially motivated > the creation of check_kernel_section_mem()). Are you saying that some old platforms rely on the check_kernel_section_mem() method adding the memory occupied by the kernel to the system? If so, do you have an example of such? Personally I also had my hand itching to remove that method years ago, but I didn't dare to do so for the same reason in mind... On the other hand if we detected all the platforms that needed that method, we could have moved it to their prom_init() or something and got rid of that atavism for good. > > As I understood, the issue is that memblock API operates internally on the > page granularity (at least there are many ROUND_DOWN() inside for the size > or upper boundary), Hm, I don't think so. Memblock doesn't work with the pages granularity, but with memory ranges. round_down()/round_up() are used to find a memory range with proper alignment. (See __memblock_find_range_top_{up,down}() method implementation.) Memblock allocates a memory region with exact size and alignment as requested. That's the beauty of that allocator and one of the reasons why the kernel platforms have been painfully converted to using it instead of the old bootmem allocator. BTW the later one has indeed operated with page granularity. Getting back to the memblock allocator. It works with pages only when the kernel comes to starting the buddy allocator. So the kernel invokes memblock_free_all(), which eventually gets to calling free_low_memory_core_early()->__free_memory_core(). The later method indeed sets the memory pages free, but as you can see it's done with correct aligning PFN_UP(phys_start)/PFN_DOWN(end). > so for request_resource() to success one has to claim > the rest of the .bss last page. And with current memblock API > memblock_reserve() must appear somewhere, being this ARCH or platform code. After a short glance at the request_resource() code I didn't manage to find a reason why the method would fail to request a page-unaligned region. AFAICS it will fail only if the memory occupied by the kernel hasn't been registered as system memory. The later case may happen only for the systems which rely on the check_kernel_section_mem() method being called in the generic arch_mem_init(). Of course we shouldn't blindly have it removed, but instead move it to the platforms, which have been unfortunate enough not to add the kernel memory to the system memory pool. So IMHO what could be the best conclusion in the framework of this patch: 1) As Thomas said any platform-specific reservation should be done in the platform-specific code. That means if octeon needs some memory behind the kernel being reserved, then it should be done for example in prom_init(). 2) The check_kernel_sections_mem() method can be removed. But it should be done carefully. We at least need to try to find all the platforms, which rely on its functionality. -Sergey > > -- > Best regards, > Alexander Sverdlin.