We can currently crash in shrink_zone_span() in case we access an uninitialized memmap (via page_to_nid()). Root issue is that we cannot always identify which memmap was actually initialized. Let's improve the situation by looking only at online PFNs for !ZONE_DEVICE memory. This is now very reliable - similar to set_zone_contiguous(). (Side note: set_zone_contiguous() will never succeed on ZONE_DEVICE memory right now as we have no online PFNs ...). For ZONE_DEVICE memory, make sure we don't crash by special-casing poisoned pages and always checking that the NID has a sane value. We might still read garbage and get false positives, but it certainly improves the situation. Note: Especially subsections make it very hard to detect which parts of a ZONE_DEVICE memmap were actually initialized - otherwise we could just have reused SECTION_IS_ONLINE. This needs more thought. Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Oscar Salvador <osalvador@xxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: Wei Yang <richardw.yang@xxxxxxxxxxxxxxx> Reported-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> --- mm/memory_hotplug.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 663853bf97ed..65b3fdf7f838 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -334,6 +334,17 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone, if (unlikely(!pfn_valid(start_pfn))) continue; + /* + * TODO: There is no way we can identify whether the memmap + * of ZONE_DEVICE memory was initialized. We might get + * false positives when reading garbage. + */ + if (zone_idx(zone) == ZONE_DEVICE) { + if (PagePoisoned(pfn_to_page(start_pfn))) + continue; + } else if (!pfn_to_online_page(start_pfn)) + continue; + if (unlikely(pfn_to_nid(start_pfn) != nid)) continue; @@ -359,6 +370,17 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone, if (unlikely(!pfn_valid(pfn))) continue; + /* + * TODO: There is no way we can identify whether the memmap + * of ZONE_DEVICE memory was initialized. We might get + * false positives when reading garbage. + */ + if (zone_idx(zone) == ZONE_DEVICE) { + if (PagePoisoned(pfn_to_page(pfn))) + continue; + } else if (!pfn_to_online_page(pfn)) + continue; + if (unlikely(pfn_to_nid(pfn) != nid)) continue; -- 2.21.0