On Fri 04-09-20 11:14:48, Pavel Tatashin wrote: > It is expensive to drain per-cpu page lists as a thread is started on each > CPU, and we are waiting for them to complete. > > Currently, we drain on every block of pages that is isolated. Instead lets > drain once after pages are isolated. > > For example, when 2G of memory is hot-removed drain is called 16 times, > with this change it will be done only one time on average. I do agree that the current implementation is much less effective than it could be but I disagree we should be pushing the burden to all callers as already stated (http://lkml.kernel.org/r/20200907072608.GE30144@xxxxxxxxxxxxxx). I believe it should be perfectly fine to start_isolate_page_range would improve the situation considerably. There are some minor details to sort out (multizone pfn span which we do not allow but probably should be enforcing somehow). > Signed-off-by: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> I believe we should be going an opposite direction and define a more understandable and usable semantic for start_isolate_page_range. We do not want callers to scratch their heads to call all caches they might need to flush. > --- > mm/memory_hotplug.c | 15 +-------------- > mm/page_alloc.c | 2 ++ > mm/page_isolation.c | 40 ++++++++++++---------------------------- > 3 files changed, 15 insertions(+), 42 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index b11a269e2356..5a2ed1a94555 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1536,6 +1536,7 @@ static int __ref __offline_pages(unsigned long start_pfn, > } > > do { > + drain_all_pages(zone); > pfn = start_pfn; > do { > if (signal_pending(current)) { > @@ -1575,20 +1576,6 @@ static int __ref __offline_pages(unsigned long start_pfn, > /* check again */ > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > NULL, check_pages_isolated_cb); > - /* > - * per-cpu pages are drained in start_isolate_page_range, but if > - * there are still pages that are not free, make sure that we > - * drain again, because when we isolated range we might > - * have raced with another thread that was adding pages to pcp > - * list. > - * > - * Forward progress should be still guaranteed because > - * pages on the pcp list can only belong to MOVABLE_ZONE > - * because has_unmovable_pages explicitly checks for > - * PageBuddy on freed pages on other zones. > - */ > - if (ret) > - drain_all_pages(zone); > } while (ret); > > /* Ok, all of our target is isolated. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index fab5e97dc9ca..6d6a501a103e 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8462,6 +8462,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, > if (ret < 0) > return ret; > > + drain_all_pages(cc.zone); > + > /* > * In case of -EBUSY, we'd like to know which page causes problem. > * So, just fall through. test_pages_isolated() has a tracepoint > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 63a3db10a8c0..8dfa6c6c668d 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -19,8 +19,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ > { > struct page *unmovable = NULL; > struct zone *zone; > - unsigned long flags; > - int ret = -EBUSY; > + unsigned long flags, nr_pages; > + int ret = -EBUSY, mt; > > zone = page_zone(page); > > @@ -39,24 +39,18 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ > * We just check MOVABLE pages. > */ > unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags); > - if (!unmovable) { > - unsigned long nr_pages; > - int mt = get_pageblock_migratetype(page); > - > - set_pageblock_migratetype(page, MIGRATE_ISOLATE); > - zone->nr_isolate_pageblock++; > - nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, > - NULL); > - > - __mod_zone_freepage_state(zone, -nr_pages, mt); > - ret = 0; > - } > + if (unmovable) > + goto out; > > + mt = get_pageblock_migratetype(page); > + set_pageblock_migratetype(page, MIGRATE_ISOLATE); > + zone->nr_isolate_pageblock++; > + nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, NULL); > + __mod_zone_freepage_state(zone, -nr_pages, mt); > + ret = 0; > out: > spin_unlock_irqrestore(&zone->lock, flags); > - if (!ret) { > - drain_all_pages(zone); > - } else { > + if (ret) { > WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); > > if ((isol_flags & REPORT_FAILURE) && unmovable) > @@ -170,14 +164,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * pageblocks we may have modified and return -EBUSY to caller. This > * prevents two threads from simultaneously working on overlapping ranges. > * > - * Please note that there is no strong synchronization with the page allocator > - * either. Pages might be freed while their page blocks are marked ISOLATED. > - * In some cases pages might still end up on pcp lists and that would allow > - * for their allocation even when they are in fact isolated already. Depending > - * on how strong of a guarantee the caller needs drain_all_pages might be needed > - * (e.g. __offline_pages will need to call it after check for isolated range for > - * a next retry). > - * > * Return: the number of isolated pageblocks on success and -EBUSY if any part > * of range cannot be isolated. > */ > @@ -192,9 +178,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); > BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages)); > > - for (pfn = start_pfn; > - pfn < end_pfn; > - pfn += pageblock_nr_pages) { > + for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { > page = __first_valid_page(pfn, pageblock_nr_pages); > if (page) { > if (set_migratetype_isolate(page, migratetype, flags)) { > -- > 2.25.1 > -- Michal Hocko SUSE Labs