Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Thu, 7 Dec 2017 16:53:17 -0800

On Fri, 8 Dec 2017 00:25:37 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:

> Well, it's release_pages. From core VM and the block layer, not very long
> but for drivers and filesystems, it can be arbitrarily long. Even from the
> VM, the function can be called a lot but as it's from pagevec context so
> it's naturally broken into small pieces anyway.

OK.

> > If "significantly" then there may be additional benefit in rearranging
> > free_hot_cold_page_list() so it only walks a small number of list
> > entries at a time.  So the data from the first loop is still in cache
> > during execution of the second loop.  And that way this
> > long-irq-off-time problem gets fixed automagically.
> > 
> 
> I'm not sure it's worthwhile. In too many cases, the list of pages being
> released are either cache cold or are so long that the cache data is
> being thrashed anyway.

Well, whether the incoming data is cache-cold or very-long, doing that
double pass in small bites would reduce thrashing.

> Once the core page allocator is involved, then
> there will be further cache thrashing due to buddy page merging accessing
> data that is potentially very close. I think it's unlikely there would be
> much value in using alternative schemes unless we were willing to have
> very large per-cpu lists -- something I prototyped for fast networking
> but never heard back whether it's worthwhile or not.

I mean something like this....

(strangely indented for clarity)

--- a/mm/page_alloc.c~a
+++ a/mm/page_alloc.c
@@ -2685,12 +2685,17 @@ void free_unref_page_list(struct list_he
 	struct page *page, *next;
 	unsigned long flags, pfn;
 
+while (!list_empty(list)) {
+	unsigned batch = 0;
+
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
 		pfn = page_to_pfn(page);
 		if (!free_unref_page_prepare(page, pfn))
 			list_del(&page->lru);
 		set_page_private(page, pfn);
+		if (batch++ == SWAP_CLUSTER_MAX)
+			break;
 	}
 
 	local_irq_save(flags);
@@ -2699,8 +2704,10 @@ void free_unref_page_list(struct list_he
 
 		set_page_private(page, 0);
 		trace_mm_page_free_batched(page);
+		list_del(&page->lru);	/* now needed, I think? */
 		free_unref_page_commit(page, pfn);
 	}
+}
 	local_irq_restore(flags);
 }
 

But I agree that freeing of a lengthy list is likely to be rare.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>