This fixes an annoying race in the CMA code leading to lots of "PFNs busy" messages when CMA is used concurrently. This is harmless normally as CMA will just retry the allocation at a different place, but it might lead to increased fragmentation of the CMA area as well as failing allocations when CMA is under memory pressure. The issue is that test_pages_isolated checks if the range is free by checking that all pages in the range are buddy pages. For this to work the start pfn needs to be aligned to the higher order buddy page including the start pfn if there is any. This is not a problem for the memory hotplug code, as it always offlines whole pageblocks, but CMA may want to isolate a smaller range. So for the check to work correctly it down-aligns the start pfn to the higher order buddy page. As the zone is not yet locked at that point a concurrent page free might coalesce the pages to be checked into an even bigger buddy page, causing the check to fail, while all pages are in fact buddy pages. By moving the zone locking to the caller of the test function, it's possible to do it before CMA tries to find the proper start page and stop any concurrent page coalescing to happen until the check is finished. Signed-off-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx> --- mm/memory_hotplug.c | 2 ++ mm/page_alloc.c | 11 +++++++---- mm/page_isolation.c | 13 ++++++++----- 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 4af58a3a8ffa..05de25bd228e 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1828,7 +1828,9 @@ repeat: */ dissolve_free_huge_pages(start_pfn, end_pfn); /* check again */ + spin_lock_irqsave(&zone->lock, flags); offlined_pages = check_pages_isolated(start_pfn, end_pfn); + spin_unlock_irqrestore(&zone->lock, flags); if (offlined_pages < 0) { ret = -EBUSY; goto failed_removal; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 838ca8bb64f7..50b3b8f9594f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6702,14 +6702,15 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype) { - unsigned long outer_start, outer_end; + struct zone *zone = page_zone(pfn_to_page(start)); + unsigned long outer_start, outer_end, flags; unsigned int order; int ret = 0; struct compact_control cc = { .nr_migratepages = 0, .order = -1, - .zone = page_zone(pfn_to_page(start)), + .zone = zone, .mode = MIGRATE_SYNC, .ignore_skip_hint = true, }; @@ -6775,6 +6776,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, order = 0; outer_start = start; + spin_lock_irqsave(&zone->lock, flags); while (!PageBuddy(pfn_to_page(outer_start))) { if (++order >= MAX_ORDER) { outer_start = start; @@ -6797,10 +6799,11 @@ int alloc_contig_range(unsigned long start, unsigned long end, } /* Make sure the range is really isolated. */ - if (test_pages_isolated(outer_start, end, false)) { + ret = test_pages_isolated(outer_start, end, false); + spin_unlock_irqrestore(&zone->lock, flags); + if (ret) { pr_info("%s: [%lx, %lx) PFNs busy\n", __func__, outer_start, end); - ret = -EBUSY; goto done; } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 92c4c36501e7..357c8e39a08e 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -246,12 +246,18 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, return pfn; } +/* + * Test all pages in the range are isolated. + * All pages in [start_pfn...end_pfn) must be in the same zone. + * zone->lock must be held before call this. + * + * Returns 0 if all pages in the range are isolated. + */ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, bool skip_hwpoisoned_pages) { - unsigned long pfn, flags; + unsigned long pfn; struct page *page; - struct zone *zone; /* * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages @@ -267,11 +273,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, if ((pfn < end_pfn) || !page) return -EBUSY; /* Check all pages are free or marked as ISOLATED */ - zone = page_zone(page); - spin_lock_irqsave(&zone->lock, flags); pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn, skip_hwpoisoned_pages); - spin_unlock_irqrestore(&zone->lock, flags); trace_test_pages_isolated(start_pfn, end_pfn, pfn); -- 2.7.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>