On Wed, Jun 02, 2021 at 04:54:17PM +0300, Mike Rapoport wrote: > On Wed, Jun 02, 2021 at 11:15:21AM +0100, Russell King (Oracle) wrote: > > On Wed, Jun 02, 2021 at 11:33:10AM +0300, Mike Rapoport wrote: > > > On Tue, Jun 01, 2021 at 02:54:15PM +0100, Russell King (Oracle) wrote: > > > > If I look at one of my kernels: > > > > > > > > c0008000 T _text > > > > c0b5b000 R __end_rodata > > > > ... exception and unwind tables live here ... > > > > c0c00000 T __init_begin > > > > c0e00000 D _sdata > > > > c0e68870 D _edata > > > > c0e68870 B __bss_start > > > > c0e995d4 B __bss_stop > > > > c0e995d4 B _end > > > > > > > > So the original covers _text..__init_begin-1 which includes the > > > > exception and unwind tables. Your version above omits these, which > > > > leaves them exposed. > > > > > > Right, this needs to be fixed. Is there any reason the exception and unwind > > > tables cannot be placed between _sdata and _edata? > > > > > > It seems to me that they were left outside for purely historical reasons. > > > Commit ee951c630c5c ("ARM: 7568/1: Sort exception table at compile time") > > > moved the exception tables out of .data section before _sdata existed. > > > Commit 14c4a533e099 ("ARM: 8583/1: mm: fix location of _etext") moved > > > _etext before the unwind tables and didn't bother to put them into data or > > > rodata areas. > > > > You can not assume that all sections will be between these symbols. This > > isn't specific to 32-bit ARM. If you look at x86's vmlinux.lds.in, you > > will see that BUG_TABLE and ORC_UNWIND_TABLE are after _edata, along > > with many other undiscarded sections before __bss_start. > > But if you look at x86's setup_arch() all these never make it to the > resource tree. So there are holes in /proc/iomem between the kernel > resources. Also true. However, my point was to counter your claim that these sections should be part of the .text/.data/.rodata etc sections in the output vmlinux. There is, however, a more important point. The __ex_table section must exist and be separate from the .text/.data/.rodata sections in the output ELF file, as sorttable (the exception table sorter) relies on this to be able to find the table and sort it. So, it isn't entirely "for historical reasons" as you said two messages ago. > > So it seems your assumptions in trying to clean this up are somewhat > > false. > > My assumption was that there is complete lack of consistency between what > is reserved memory and how it is reported in /proc/iomem or > /sys/firmware/memmap for that matter. I'm not trying to clean this up, I'm > trying to make different views of the physical memory consistent. > Consolidating several similar per-arch implementations is the first step in > this direction. It looks to me that there is quite a number of things that need fixing. One glaring thing is the kernel's init memory - should that be counted as reserved memory? It's marked as such in memblock and /proc/iomem, yet we free these pages into the page allocator after boot meaning they are just like any other page in the memory allocator - they are most certainly not "reserved" at that point. So, what is reported as reserved in firmware maps will be different from memblock. Memblock includes kernel boot-time allocations, which count as "reserved" but are not part of the firmware maps - these will be for things like early page tables and the struct page array. So, you're never going to get consistency between memblock and firmware. Memblock and /proc/iomem should be fairly consistent - areas marked as reserved in memblock seem to be propagated into /proc/iomem, including areas around the kernel image (the resources that you're changing in your patch.) Here's an example: /sys/kernel/debug/memblock/reserved: 1: 0x0000000081210000..0x0000000082d6efff 2: 0x0000000082d71000..0x0000000082d7ffff 81210000-821cffff : Kernel code 821d0000-8246ffff : reserved 82470000-82d7ffff : Kernel data This is aarch64, which isn't as accurate as 32-bit ARM in /proc/iomem: /sys/kernel/debug/memblock/reserved: 1: 0x0000000040200000..0x0000000040ea1c17 /proc/iomem: 40008000-40bfffff : Kernel code 40e00000-40ea1c17 : Kernel data 32-bit ARM doesn't forward the memblock reserved areas into /proc/iomem because they are kernel allocations. In the example I show above for 32-bit ARM, there are no firmware reserved regions, yet there are 19 memblock "reserved" regions. I think part of the problem here is understanding what "reserved" means in these cases. For something passed to the kernel from firmware, it's an area that firmware doesn't want the OS to use. For memblock, it is those areas plus allocations made early on during kernel boot before the page allocator is up and running, and includes areas of memory that these allocations must avoid (e.g. due to initramfs or device tree temporarily residing there.) Then there's differences in what should be placed in /proc/iomem. Now, bear in mind that /proc/iomem is a user API, one which userspace depends on. If we start going around making /proc/iomem report stuff like kernel boot time reservations as "reserved" memory, we will end up breaking the kexec tooling on some platforms. For example, kexec tooling for 32-bit ARM parses /proc/iomem, looking for "System RAM", "System RAM (boot alias)" and "reserved" regions. So, I think changes to make this "more consistent" come with high risk. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!