On Fri, 8 Dec 2017 00:25:37 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > Well, it's release_pages. From core VM and the block layer, not very long > but for drivers and filesystems, it can be arbitrarily long. Even from the > VM, the function can be called a lot but as it's from pagevec context so > it's naturally broken into small pieces anyway. OK. > > If "significantly" then there may be additional benefit in rearranging > > free_hot_cold_page_list() so it only walks a small number of list > > entries at a time. So the data from the first loop is still in cache > > during execution of the second loop. And that way this > > long-irq-off-time problem gets fixed automagically. > > > > I'm not sure it's worthwhile. In too many cases, the list of pages being > released are either cache cold or are so long that the cache data is > being thrashed anyway. Well, whether the incoming data is cache-cold or very-long, doing that double pass in small bites would reduce thrashing. > Once the core page allocator is involved, then > there will be further cache thrashing due to buddy page merging accessing > data that is potentially very close. I think it's unlikely there would be > much value in using alternative schemes unless we were willing to have > very large per-cpu lists -- something I prototyped for fast networking > but never heard back whether it's worthwhile or not. I mean something like this.... (strangely indented for clarity) --- a/mm/page_alloc.c~a +++ a/mm/page_alloc.c @@ -2685,12 +2685,17 @@ void free_unref_page_list(struct list_he struct page *page, *next; unsigned long flags, pfn; +while (!list_empty(list)) { + unsigned batch = 0; + /* Prepare pages for freeing */ list_for_each_entry_safe(page, next, list, lru) { pfn = page_to_pfn(page); if (!free_unref_page_prepare(page, pfn)) list_del(&page->lru); set_page_private(page, pfn); + if (batch++ == SWAP_CLUSTER_MAX) + break; } local_irq_save(flags); @@ -2699,8 +2704,10 @@ void free_unref_page_list(struct list_he set_page_private(page, 0); trace_mm_page_free_batched(page); + list_del(&page->lru); /* now needed, I think? */ free_unref_page_commit(page, pfn); } +} local_irq_restore(flags); } But I agree that freeing of a lengthy list is likely to be rare. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>