On Wed, Nov 25, 2020 at 02:32:02PM +0100, David Hildenbrand wrote: > On 25.11.20 13:08, Vlastimil Babka wrote: > > On 11/25/20 6:34 AM, Andrea Arcangeli wrote: > >> Hello, > >> > >> On Mon, Nov 23, 2020 at 02:01:16PM +0100, Vlastimil Babka wrote: > >>> On 11/21/20 8:45 PM, Andrea Arcangeli wrote: > >>>> A corollary issue was fixed in > >>>> 39639000-39814fff : Unknown E820 type > >>>> > >>>> pfn 0x7a200 -> 0x7a200000 min_pfn hit non-RAM: > >>>> > >>>> 7a17b000-7a216fff : Unknown E820 type > >>> > >>> It would be nice to also provide a /proc/zoneinfo and how exactly the > >>> "zone_spans_pfn" was violated. I assume we end up below zone's > >>> start_pfn, but is it true? > >> > >> Agreed, I was about to grab that info along with all page struct > >> around the pfn 0x7a200 and phys address 0x7a216fff. > >> > >> # grep -A1 E820 /proc/iomem > >> 7a17b000-7a216fff : Unknown E820 type > >> 7a217000-7bffffff : System RAM > >> > >> DMA zone_start_pfn 1 zone_end_pfn() 4096 contiguous 1 > >> DMA32 zone_start_pfn 4096 zone_end_pfn() 1048576 contiguous 0 > >> Normal zone_start_pfn 1048576 zone_end_pfn() 4715392 contiguous 1 > >> Movable zone_start_pfn 0 zone_end_pfn() 0 contiguous 0 > > > > So the above means that around the "Unknown E820 type" we have: > > > > pfn 499712 - start of pageblock in ZONE_DMA32 > > pfn 500091 - start of the "Unknown E820 type" range > > pfn 500224 - start of another pageblock > > pfn 500246 - end of "Unknown E820 type" > > > > So this is indeed not a zone boundary issue, but basically a hole not > > aligned to pageblock boundary and really unexpected. > > We have CONFIG_HOLES_IN_ZONE (that x86 doesn't set) for architectures > > that do this, and even that config only affects pfn_valid_within(). But > > here pfn_valid() is true, but the zone/node linkage is unexpected. > > > >> However the real bug seems that reserved pages have a zero zone_id in > >> the page->flags when it should have the real zone id/nid. The patch I > >> sent earlier to validate highest would only be needed to deal with > >> pfn_valid. > >> > >> Something must have changed more recently than v5.1 that caused the > >> zoneid of reserved pages to be wrong, a possible candidate for the > >> real would be this change below: > >> > >> + __init_single_page(pfn_to_page(pfn), pfn, 0, 0); > >> > >> Even if it may not be it, at the light of how the reserved page > >> zoneid/nid initialized went wrong, the above line like it's too flakey > >> to stay. > >> > >> It'd be preferable if the pfn_valid fails and the > >> pfn_to_section_nr(pfn) returns an invalid section for the intermediate > >> step. Even better memset 0xff over the whole page struct until the > >> second stage comes around. > >> > >> Whenever pfn_valid is true, it's better that the zoneid/nid is correct > >> all times, otherwise if the second stage fails we end up in a bug with > >> weird side effects. > > > > Yeah I guess it would be simpler if zoneid/nid was correct for > > pfn_valid() pfns within a zone's range, even if they are reserved due > > not not being really usable memory. > > > > I don't think we want to introduce CONFIG_HOLES_IN_ZONE to x86. If the > > chosen solution is to make this to a real hole, the hole should be > > extended to MAX_ORDER_NR_PAGES aligned boundaries. > > As we don't punch out pages of the memmap on x86-64, pfn_valid() keeps > working as expected. There is a memmap that can be accessed and that was > initialized. I suspect that memmap for the reserved pages is not properly initialized after recent changes in free_area_init(). They are cleared at init_unavailable_mem() to have zone=0 and node=0, but they seem to be never re-initialized with proper zone and node links which was not the case before commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN"). Back then, memmap_init_zone() looped from zone_start_pfn till zone_end_pfn and struct page for reserved pages with pfns inside the zone would be initialized. Now the loop is for interesection of [zone_start_pfn, zone_end_pfn] with memblock.memory and for x86 reserved ranges are not in memblock.memory, so the memmap for them remains semi-initialized. > It's really just a matter of how to handle memory holes in > this scenario. > > a) Try initializing them to the covering node/zone (I gave one example > that might be tricky with hotplug) > b) Mark such pages (either special node/zone or pagetype) and make pfn > walkers ignore these holes. For now, this can only be done via the > reserved flag. > > -- > Thanks, > > David / dhildenb > -- Sincerely yours, Mike.