Re: [PATCH 5/7] mm: page_alloc: Make zone distribution page aging policy configurable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 16, 2013 at 03:42:15PM -0500, Johannes Weiner wrote:
> On Fri, Dec 13, 2013 at 02:10:05PM +0000, Mel Gorman wrote:
> > Commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy") solved a
> > bug whereby new pages could be reclaimed before old pages because of
> > how the page allocator and kswapd interacted on the per-zone LRU lists.
> > Unfortunately it was missed during review that a consequence is that
> > we also round-robin between NUMA nodes. This is bad for two reasons
> > 
> > 1. It alters the semantics of MPOL_LOCAL without telling anyone
> > 2. It incurs an immediate remote memory performance hit in exchange
> >    for a potential performance gain when memory needs to be reclaimed
> >    later
> > 
> > No cookies for the reviewers on this one.
> > 
> > This patch makes the behaviour of the fair zone allocator policy
> > configurable.  By default it will only distribute pages that are going
> > to exist on the LRU between zones local to the allocating process. This
> > preserves the historical semantics of MPOL_LOCAL.
> > 
> > By default, slab pages are not distributed between zones after this patch is
> > applied. It can be argued that they should get similar treatment but they
> > have different lifecycles to LRU pages, the shrinkers are not zone-aware
> > and the interaction between the page allocator and kswapd is different
> > for slabs. If it turns out to be an almost universal win, we can change
> > the default.
> > 
> > Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> > ---
> >  Documentation/sysctl/vm.txt |  32 ++++++++++++++
> >  include/linux/mmzone.h      |   2 +
> >  include/linux/swap.h        |   2 +
> >  kernel/sysctl.c             |   8 ++++
> >  mm/page_alloc.c             | 102 ++++++++++++++++++++++++++++++++++++++------
> >  5 files changed, 134 insertions(+), 12 deletions(-)
> > 
> > diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> > index 1fbd4eb..8eaa562 100644
> > --- a/Documentation/sysctl/vm.txt
> > +++ b/Documentation/sysctl/vm.txt
> > @@ -56,6 +56,7 @@ Currently, these files are in /proc/sys/vm:
> >  - swappiness
> >  - user_reserve_kbytes
> >  - vfs_cache_pressure
> > +- zone_distribute_mode
> >  - zone_reclaim_mode
> >  
> >  ==============================================================
> > @@ -724,6 +725,37 @@ causes the kernel to prefer to reclaim dentries and inodes.
> >  
> >  ==============================================================
> >  
> > +zone_distribute_mode
> > +
> > +Pages allocation and reclaim are managed on a per-zone basis. When the
> > +system needs to reclaim memory, candidate pages are selected from these
> > +per-zone lists.  Historically, a potential consequence was that recently
> > +allocated pages were considered reclaim candidates. From a zone-local
> > +perspective, page aging was preserved but from a system-wide perspective
> > +there was an age inversion problem.
> > +
> > +A similar problem occurs on a node level where young pages may be reclaimed
> > +from the local node instead of allocating remote memory. Unforuntately, the
> > +cost of accessing remote nodes is higher so the system must choose by default
> > +between favouring page aging or node locality. zone_distribute_mode controls
> > +how the system will distribute page ages between zones.
> > +
> > +0	= Never round-robin based on age
> 
> I think we should be very conservative with the userspace interface we
> export on a mechanism we are obviously just figuring out.
> 

And we have a proposal on how to limit this. I'll be layering another
patch on top and removes this interface again. That will allows us to
rollback one patch and still have a usable interface if necessary.

> > +Otherwise the values are ORed together
> > +
> > +1	= Distribute anon pages between zones local to the allocating node
> > +2	= Distribute file pages between zones local to the allocating node
> > +4	= Distribute slab pages between zones local to the allocating node
> 
> Zone fairness within a node does not affect mempolicy or remote
> reference costs.  Is there a reason to have this configurable?
> 

Symmetry

> > +The following three flags effectively alter MPOL_DEFAULT, be careful.
> > +
> > +8	= Distribute anon pages between zones remote to the allocating node
> > +16	= Distribute file pages between zones remote to the allocating node
> > +32	= Distribute slab pages between zones remote to the allocating node
> 
> Yes, it's conceivable that somebody might want to disable remote
> distribution because of the extra references.
> 
> But at this point, I'd much rather back out anon and slab distribution
> entirely, it was a mistake to include them.
> 
> That would leave us with a single knob to disable remote page cache
> placement.
> 

When looking at this closer I found that sysv is a weird exception. It's
file-backed as far as most of the VM is concerned but looks anonymous to
most applications that care. That and MAP_SHARED anonymous pages should
not be treated like files but we still want tmpfs to be treated as
files. Details will be in the changelog of the next series.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]