Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list

Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> · Fri, 8 Dec 2017 00:25:37 +0000

On Thu, Dec 07, 2017 at 03:20:59PM -0800, Andrew Morton wrote:
> On Thu, 7 Dec 2017 19:51:03 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> > On Thu, Dec 07, 2017 at 06:03:14PM +0100, Lucas Stach wrote:
> > > Since 9cca35d42eb6 (mm, page_alloc: enable/disable IRQs once when freeing
> > > a list of pages) we see excessive IRQ disabled times of up to 250ms on an
> > > embedded ARM system (tracing overhead included).
> > > 
> > > This is due to graphics buffers being freed back to the system via
> > > release_pages(). Graphics buffers can be huge, so it's not hard to hit
> > > cases where the list of pages to free has 2048 entries. Disabling IRQs
> > > while freeing all those pages is clearly not a good idea.
> > > 
> > 
> > 250ms to free 2048 entries? That seems excessive but I guess the
> > embedded ARM system is not that fast.
> 
> I wonder how common such lenghty lists are.
> 

Well, it's release_pages. From core VM and the block layer, not very long
but for drivers and filesystems, it can be arbitrarily long. Even from the
VM, the function can be called a lot but as it's from pagevec context so
it's naturally broken into small pieces anyway.

> If "significantly" then there may be additional benefit in rearranging
> free_hot_cold_page_list() so it only walks a small number of list
> entries at a time.  So the data from the first loop is still in cache
> during execution of the second loop.  And that way this
> long-irq-off-time problem gets fixed automagically.
> 

I'm not sure it's worthwhile. In too many cases, the list of pages being
released are either cache cold or are so long that the cache data is
being thrashed anyway. Once the core page allocator is involved, then
there will be further cache thrashing due to buddy page merging accessing
data that is potentially very close. I think it's unlikely there would be
much value in using alternative schemes unless we were willing to have
very large per-cpu lists -- something I prototyped for fast networking
but never heard back whether it's worthwhile or not.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>