On Wed 19-08-20 12:11:57, David Hildenbrand wrote: > Currently, it can happen that pages are allocated (and freed) via the buddy > before we finished basic memory onlining. > > For example, pages are exposed to the buddy and can be allocated before > we actually mark the sections online. Allocated pages could suddenly > fail pfn_to_online_page() checks. We had similar issues with pcp > handling, when pages are allocated+freed before we reach > zone_pcp_update() in online_pages() [1]. > > Instead, mark all pageblocks MIGRATE_ISOLATE, such that allocations are > impossible. Once done with the heavy lifting, use > undo_isolate_page_range() to move the pages to the MIGRATE_MOVABLE > freelist, marking them ready for allocation. Similar to offline_pages(), > we have to manually adjust zone->nr_isolate_pageblock. > > [1] https://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@xxxxxxxxxxxxxx > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Wei Yang <richard.weiyang@xxxxxxxxxxxxxxxxx> > Cc: Baoquan He <bhe@xxxxxxxxxx> > Cc: Pankaj Gupta <pankaj.gupta.linux@xxxxxxxxx> > Cc: Oscar Salvador <osalvador@xxxxxxx> > Cc: Charan Teja Reddy <charante@xxxxxxxxxxxxxx> > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Yes this looks very sensible and we should have done that from the beginning. I just have one minor comment below > @@ -816,6 +816,14 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, > if (ret) > goto failed_addition; > > + /* > + * Fixup the number of isolated pageblocks before marking the sections > + * onlining, such that undo_isolate_page_range() works correctly. > + */ > + spin_lock_irqsave(&zone->lock, flags); > + zone->nr_isolate_pageblock += nr_pages / pageblock_nr_pages; > + spin_unlock_irqrestore(&zone->lock, flags); > + I am not entirely happy about this. I am wondering whether it would make more sense to keep the counter in sync already in memmap_init_zone. Sure we add a branch to the boot time initialization - and it always fails there - but the code would be cleaner and we wouldn't have to do tricks like this in caller(s). -- Michal Hocko SUSE Labs