>> Signed-off-by: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > > I believe we should be going an opposite direction and define a more > understandable and usable semantic for start_isolate_page_range. We do > not want callers to scratch their heads to call all caches they might > need to flush. I still prefer temporarily disabling PCP, or using a different mechanism to teach PCP to no work on isolated pageblocks (while the latter seems to be harder to achieve). I can spot that Vlastimil already tried to implement it - I'll have a look shortly. Having that said, this patch improves the situation in case we cannot get temporarily disabling of PCP implemented - instead of flushing at random place, we flush at less random places :) But it is far from ideal. It's just another of the special cases that developed over time and instead of properly fixing it, we work around it. Note: I think the flush semantics of drain_all_pages() is very different from lru_add_drain_all(). The latter does not mess with isolated page that have just been freed (as it's not part of the core buddy / buddy extension). > >> --- >> mm/memory_hotplug.c | 15 +-------------- >> mm/page_alloc.c | 2 ++ >> mm/page_isolation.c | 40 ++++++++++++---------------------------- >> 3 files changed, 15 insertions(+), 42 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index b11a269e2356..5a2ed1a94555 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1536,6 +1536,7 @@ static int __ref __offline_pages(unsigned long start_pfn, >> } >> >> do { >> + drain_all_pages(zone); >> pfn = start_pfn; >> do { >> if (signal_pending(current)) { >> @@ -1575,20 +1576,6 @@ static int __ref __offline_pages(unsigned long start_pfn, >> /* check again */ >> ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, >> NULL, check_pages_isolated_cb); >> - /* >> - * per-cpu pages are drained in start_isolate_page_range, but if >> - * there are still pages that are not free, make sure that we >> - * drain again, because when we isolated range we might >> - * have raced with another thread that was adding pages to pcp >> - * list. >> - * >> - * Forward progress should be still guaranteed because >> - * pages on the pcp list can only belong to MOVABLE_ZONE >> - * because has_unmovable_pages explicitly checks for >> - * PageBuddy on freed pages on other zones. >> - */ >> - if (ret) >> - drain_all_pages(zone); >> } while (ret); >> >> /* Ok, all of our target is isolated. >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index fab5e97dc9ca..6d6a501a103e 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -8462,6 +8462,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, >> if (ret < 0) >> return ret; >> >> + drain_all_pages(cc.zone); >> + >> /* >> * In case of -EBUSY, we'd like to know which page causes problem. >> * So, just fall through. test_pages_isolated() has a tracepoint >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 63a3db10a8c0..8dfa6c6c668d 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -19,8 +19,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ >> { >> struct page *unmovable = NULL; >> struct zone *zone; >> - unsigned long flags; >> - int ret = -EBUSY; >> + unsigned long flags, nr_pages; >> + int ret = -EBUSY, mt; >> >> zone = page_zone(page); >> >> @@ -39,24 +39,18 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ >> * We just check MOVABLE pages. >> */ >> unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags); >> - if (!unmovable) { >> - unsigned long nr_pages; >> - int mt = get_pageblock_migratetype(page); >> - >> - set_pageblock_migratetype(page, MIGRATE_ISOLATE); >> - zone->nr_isolate_pageblock++; >> - nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, >> - NULL); >> - >> - __mod_zone_freepage_state(zone, -nr_pages, mt); >> - ret = 0; >> - } >> + if (unmovable) >> + goto out; >> >> + mt = get_pageblock_migratetype(page); >> + set_pageblock_migratetype(page, MIGRATE_ISOLATE); >> + zone->nr_isolate_pageblock++; >> + nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, NULL); >> + __mod_zone_freepage_state(zone, -nr_pages, mt); >> + ret = 0; >> out: >> spin_unlock_irqrestore(&zone->lock, flags); >> - if (!ret) { >> - drain_all_pages(zone); >> - } else { >> + if (ret) { >> WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); >> There are patches in -mm/-next by me that already cleanup/reshuffle that code heavily, so better just keep the changes minimal or built up on top of -next. >> if ((isol_flags & REPORT_FAILURE) && unmovable) >> @@ -170,14 +164,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) >> * pageblocks we may have modified and return -EBUSY to caller. This >> * prevents two threads from simultaneously working on overlapping ranges. >> * >> - * Please note that there is no strong synchronization with the page allocator >> - * either. Pages might be freed while their page blocks are marked ISOLATED. >> - * In some cases pages might still end up on pcp lists and that would allow >> - * for their allocation even when they are in fact isolated already. Depending >> - * on how strong of a guarantee the caller needs drain_all_pages might be needed >> - * (e.g. __offline_pages will need to call it after check for isolated range for >> - * a next retry). >> - * >> * Return: the number of isolated pageblocks on success and -EBUSY if any part >> * of range cannot be isolated. >> */ >> @@ -192,9 +178,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, >> BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); >> BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages)); >> >> - for (pfn = start_pfn; >> - pfn < end_pfn; >> - pfn += pageblock_nr_pages) { >> + for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { unnecessary code churn. >> page = __first_valid_page(pfn, pageblock_nr_pages); >> if (page) { >> if (set_migratetype_isolate(page, migratetype, flags)) { >> -- >> 2.25.1 >> > -- Thanks, David / dhildenb