On 02/01/21 at 10:32am, David Hildenbrand wrote: > On 30.01.21 23:10, Mike Rapoport wrote: > > From: Mike Rapoport <rppt@xxxxxxxxxxxxx> > > > > The physical memory on an x86 system starts at address 0, but this is not > > always reflected in e820 map. For example, the BIOS can have e820 entries > > like > > > > [ 0.000000] BIOS-provided physical RAM map: > > [ 0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable > > > > or > > > > [ 0.000000] BIOS-provided physical RAM map: > > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved > > [ 0.000000] BIOS-e820: [mem 0x0000000000001000-0x0000000000057fff] usable > > > > In either case, e820__memblock_setup() won't add the range 0x0000 - 0x1000 > > to memblock.memory and later during memory map initialization this range is > > left outside any zone. > > > > With SPARSEMEM=y there is always a struct page for pfn 0 and this struct > > page will have it's zone link wrong no matter what value will be set there. > > > > To avoid this inconsistency, add the beginning of RAM to memblock.memory. > > Limit the added chunk size to match the reserved memory to avoid > > registering memory that may be used by the firmware but never reserved at > > e820__memblock_setup() time. > > > > Fixes: bde9cfa3afe4 ("x86/setup: don't remove E820_TYPE_RAM for pfn 0") > > Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx > > --- > > arch/x86/kernel/setup.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > > index 3412c4595efd..67c77ed6eef8 100644 > > --- a/arch/x86/kernel/setup.c > > +++ b/arch/x86/kernel/setup.c > > @@ -727,6 +727,14 @@ static void __init trim_low_memory_range(void) > > * Kconfig help text for X86_RESERVE_LOW. > > */ > > memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); > > + > > + /* > > + * Even if the firmware does not report the memory at address 0 as > > + * usable, inform the generic memory management about its existence > > + * to ensure it is a part of ZONE_DMA and the memory map for it is > > + * properly initialized. > > + */ > > + memblock_add(0, ALIGN(reserve_low, PAGE_SIZE)); > > } > > > > /* > > > > I think, to make that code more robust, and to not rely on archs to do the > right thing, we should do something like > > 1) Make sure in free_area_init() that each PFN with a memmap (i.e., falls > into a partial present section) is spanned by a zone; that would include PFN > 0 in this case. > > 2) In init_zone_unavailable_mem(), similar to round_up(max_pfn, > PAGES_PER_SECTION) handling, consider range > [round_down(min_pfn, PAGES_PER_SECTION), min_pfn - 1] > which would handle in the x86-64 case [0..0] and, therefore, initialize PFN > 0. Sounds reasonable. Maybe we can change to get the real expected lowest pfn from find_min_pfn_for_node() by iterating memblock.memory and memblock.reserved and comparing. > > Also, I think the special-case of PFN 0 is analogous to the > round_up(max_pfn, PAGES_PER_SECTION) handling in > init_zone_unavailable_mem(): who guarantees that these PFN above the highest > present PFN are actually spanned by a zone? > > I'd suggest going through all zone ranges in free_area_init() first, dealing > with zones that have "not section aligned start/end", clamping them up/down > if required such that no holes within a section are left uncovered by a > zone. > > -- > Thanks, > > David / dhildenb