On 10.03.21 17:14, Minchan Kim wrote:
LRU pagevec holds refcount of pages until the pagevec are drained. It could prevent migration since the refcount of the page is greater than the expection in migration logic. To mitigate the issue, callers of migrate_pages drains LRU pagevec via migrate_prep or lru_add_drain_all before migrate_pages call. However, it's not enough because pages coming into pagevec after the draining call still could stay at the pagevec so it could keep preventing page migration. Since some callers of migrate_pages have retrial logic with LRU draining, the page would migrate at next trail but it is still fragile in that it doesn't close the fundamental race between upcoming LRU pages into pagvec and migration so the migration failure could cause contiguous memory allocation failure in the end. To close the race, this patch disables lru caches(i.e, pagevec) during ongoing migration until migrate is done. Since it's really hard to reproduce, I measured how many times migrate_pages retried with force mode(it is about a fallback to a sync migration) with below debug code. int migrate_pages(struct list_head *from, new_page_t get_new_page, .. .. if (rc && reason == MR_CONTIG_RANGE && pass > 2) { printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc); dump_page(page, "fail to migrate"); } The test was repeating android apps launching with cma allocation in background every five seconds. Total cma allocation count was about 500 during the testing. With this patch, the dump_page count was reduced from 400 to 30. The new interface is also useful for memory hotplug which currently drains lru pcp caches after each migration failure. This is rather suboptimal as it has to disrupt others running during the operation. With the new interface the operation happens only once. This is also in line with pcp allocator cache which are disabled for the offlining as well. Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> --- include/linux/swap.h | 3 ++ mm/memory_hotplug.c | 3 +- mm/mempolicy.c | 4 ++- mm/migrate.c | 3 +- mm/swap.c | 79 ++++++++++++++++++++++++++++++++++++-------- 5 files changed, 75 insertions(+), 17 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 32f665b1ee85..a3e258335a7f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -339,6 +339,9 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file, extern void lru_note_cost_page(struct page *); extern void lru_cache_add(struct page *); extern void mark_page_accessed(struct page *); +extern void lru_cache_disable(void); +extern void lru_cache_enable(void); +extern bool lru_cache_disabled(void); extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_cpu_zone(struct zone *zone); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 5ba51a8bdaeb..959f659ef085 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1611,6 +1611,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) * in a way that pages from isolated pageblock are left on pcplists. */ zone_pcp_disable(zone); + lru_cache_disable();
Did you also experiment which effects zone_pcp_disable() might have on alloc_contig_range() ?
Feels like both calls could be abstracted somehow and used in both (memory offlining/alloc_contig_range) cases. It's essentially disabling some kind of caching.
Looks sane to me, but I am not that experienced with migration code to give this a real RB.
-- Thanks, David / dhildenb