David Hildenbrand <david@xxxxxxxxxx> writes: > Let's limit shrinking to !ZONE_DEVICE so we can fix the current code. We > should never try to touch the memmap of offline sections where we could > have uninitialized memmaps and could trigger BUGs when calling > page_to_nid() on poisoned pages. > > Stopping to shrink the ZONE_DEVICE is fine as set_zone_contiguous() cannot > deal with ZONE_DEVICE either way. The zones will always be !contiguous > already and zone shrinking is therefore of limited use. Can you add more details w.r.t ZONE_DEVICE? > > Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory > hotplug"), the memmap was initialized with 0 and the node with the > right value. So the zone might be wrong but not garbage. After that > commit, both the zone and the node will be garbage when touching > uninitialized memmaps. > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Oscar Salvador <osalvador@xxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") > Reported-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > --- > mm/memory_hotplug.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index ddba8d786e4a..e0d1f6a9dfeb 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -331,7 +331,7 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone, > unsigned long end_pfn) > { > for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) { > - if (unlikely(!pfn_valid(start_pfn))) > + if (unlikely(!pfn_to_online_page(start_pfn))) > continue; > > if (unlikely(pfn_to_nid(start_pfn) != nid)) > @@ -356,7 +356,7 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone, > /* pfn is the end pfn of a memory section. */ > pfn = end_pfn - 1; > for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) { > - if (unlikely(!pfn_valid(pfn))) > + if (unlikely(!pfn_to_online_page(pfn))) > continue; > > if (unlikely(pfn_to_nid(pfn) != nid)) > @@ -415,7 +415,7 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, > */ > pfn = zone_start_pfn; > for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUBSECTION) { > - if (unlikely(!pfn_valid(pfn))) > + if (unlikely(!pfn_to_online_page(pfn))) > continue; > > if (page_zone(pfn_to_page(pfn)) != zone) > @@ -463,6 +463,14 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn, > struct pglist_data *pgdat = zone->zone_pgdat; > unsigned long flags; > > + /* > + * Zone shrinking code cannot properly deal with ZONE_DEVICE. So > + * we will not try to shrink the zones - which is okay as > + * set_zone_contiguous() cannot deal with ZONE_DEVICE either way. > + */ > + if (zone_idx(zone) == ZONE_DEVICE) > + return; Can you describe here what is special about ZONE_DEVICE? > pgdat_resize_lock(zone->zone_pgdat, &flags); > shrink_zone_span(zone, start_pfn, start_pfn + nr_pages); > update_pgdat_span(pgdat); > -- > 2.21.0