在 2022/5/11 14:03, Mike Rapoport 写道: > On Tue, May 10, 2022 at 06:55:23PM -0700, Andrew Morton wrote: >> On Wed, 11 May 2022 01:05:30 +0000 Zhou Guanghui <zhouguanghui1@xxxxxxxxxx> wrote: >> >>> During early boot, the number of memblocks may exceed 128(some memory >>> areas are not reported to the kernel due to test failures. As a result, >>> contiguous memory is divided into multiple parts for reporting). If >>> the size of the init memblock regions is exceeded before the array size >>> can be resized, the excess memory will be lost. > > I'd like to see more details about how firmware creates that sparse memory > map in the changelog. > The scenario is as follows: In a system using HBM, a multi-bit ECC error occurs, and the BIOS saves the corresponding area (for example, 2 MB). When the system restarts next time, these areas are isolated and not reported or reported as EFI_UNUSABLE_MEMORY. Both of them lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks. For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ... >>> >>> ... >>> >>> --- a/mm/Kconfig >>> +++ b/mm/Kconfig >>> @@ -89,6 +89,14 @@ config SPARSEMEM_VMEMMAP >>> pfn_to_page and page_to_pfn operations. This is the most >>> efficient option when sufficient kernel resources are available. >>> >>> +config MEMBLOCK_INIT_REGIONS >>> + int "Number of init memblock regions" >>> + range 128 1024 >>> + default 128 >>> + help >>> + The number of init memblock regions which used to track "memory" and >>> + "reserved" memblocks during early boot. >>> + >>> config HAVE_MEMBLOCK_PHYS_MAP >>> bool >>> >>> diff --git a/mm/memblock.c b/mm/memblock.c >>> index e4f03a6e8e56..6893d26b750e 100644 >>> --- a/mm/memblock.c >>> +++ b/mm/memblock.c >>> @@ -22,7 +22,7 @@ >>> >>> #include "internal.h" >>> >>> -#define INIT_MEMBLOCK_REGIONS 128 >>> +#define INIT_MEMBLOCK_REGIONS CONFIG_MEMBLOCK_INIT_REGIONS >> >> Consistent naming would be nice - MEMBLOCK_INIT versus INIT_MEMBLOCK. I agree. >> >> Can we simply increase INIT_MEMBLOCK_REGIONS to 1024 and avoid the >> config option? It appears that the overhead from this would be 60kB or >> so. > > 60k is not big, but using 1024 entries array for 2-4 memory banks on > systems that don't report that fragmented memory map is really a waste. > > We can make this per platform opt-in, like INIT_MEMBLOCK_RESERVED_REGIONS ... > As I described above, is this a general scenario? >> Or zero if CONFIG_ARCH_KEEP_MEMBLOCK and CONFIG_MEMORY_HOTPLUG >> are cooperating. > > ... or add code that will discard unused parts of memblock arrays even if > CONFIG_ARCH_KEEP_MEMBLOCK=y. > In scenarios where the memory usage is sensitive, should CONFIG_ARCH_KEEP_MEMBLOCK be set to n or set the number by adding config? Andrew, Mike, thank you.