On Fri 18-04-14 13:06:37, Andrew Morton wrote: > Subject: + mm-disable-zone_reclaim_mode-by-default.patch added to -mm tree > To: mgorman@xxxxxxx,hannes@xxxxxxxxxxx,mhocko@xxxxxxx,zhangyanfei@xxxxxxxxxxxxxx > From: akpm@xxxxxxxxxxxxxxxxxxxx > Date: Fri, 18 Apr 2014 13:06:37 -0700 > > > The patch titled > Subject: mm: disable zone_reclaim_mode by default > has been added to the -mm tree. Its filename is > mm-disable-zone_reclaim_mode-by-default.patch > > This patch should soon appear at > http://ozlabs.org/~akpm/mmots/broken-out/mm-disable-zone_reclaim_mode-by-default.patch > and later at > http://ozlabs.org/~akpm/mmotm/broken-out/mm-disable-zone_reclaim_mode-by-default.patch > > Before you just go and hit "reply", please: > a) Consider who else should be cc'ed > b) Prefer to cc a suitable mailing list as well > c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/SubmitChecklist when testing your code *** > > The -mm tree is included into linux-next and is updated > there every 3-4 working days > > ------------------------------------------------------ > From: Mel Gorman <mgorman@xxxxxxx> > Subject: mm: disable zone_reclaim_mode by default > > When it was introduced, zone_reclaim_mode made sense as NUMA distances > punished and workloads were generally partitioned to fit into a NUMA node. > NUMA machines are now common but few of the workloads are NUMA-aware and > it's routine to see major performance due to zone_reclaim_mode being > enabled but relatively few can identify the problem. > > Those that require zone_reclaim_mode are likely to be able to detect when > it needs to be enabled and tune appropriately so lets have a sensible > default for the bulk of users. > > > > This patch (of 2): > > zone_reclaim_mode causes processes to prefer reclaiming memory from local > node instead of spilling over to other nodes. This made sense initially when > NUMA machines were almost exclusively HPC and the workload was partitioned > into nodes. The NUMA penalties were sufficiently high to justify reclaiming > the memory. On current machines and workloads it is often the case that > zone_reclaim_mode destroys performance but not all users know how to detect > this. Favour the common case and disable it by default. Users that are > sophisticated enough to know they need zone_reclaim_mode will detect it. > > Signed-off-by: Mel Gorman <mgorman@xxxxxxx> > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> > Reviewed-by: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> FWIW Acked-by: Michal Hocko <mhocko@xxxxxxx> > --- > > Documentation/sysctl/vm.txt | 17 +++++++++-------- > arch/ia64/include/asm/topology.h | 3 ++- > arch/powerpc/include/asm/topology.h | 8 ++------ > include/linux/topology.h | 3 ++- > mm/page_alloc.c | 2 -- > 5 files changed, 15 insertions(+), 18 deletions(-) > > diff -puN Documentation/sysctl/vm.txt~mm-disable-zone_reclaim_mode-by-default Documentation/sysctl/vm.txt > --- a/Documentation/sysctl/vm.txt~mm-disable-zone_reclaim_mode-by-default > +++ a/Documentation/sysctl/vm.txt > @@ -772,16 +772,17 @@ This is value ORed together of > 2 = Zone reclaim writes dirty pages out > 4 = Zone reclaim swaps pages > > -zone_reclaim_mode is set during bootup to 1 if it is determined that pages > -from remote zones will cause a measurable performance reduction. The > -page allocator will then reclaim easily reusable pages (those page > -cache pages that are currently not used) before allocating off node pages. > - > -It may be beneficial to switch off zone reclaim if the system is > -used for a file server and all of memory should be used for caching files > -from disk. In that case the caching effect is more important than > +zone_reclaim_mode is disabled by default. For file servers or workloads > +that benefit from having their data cached, zone_reclaim_mode should be > +left disabled as the caching effect is likely to be more important than > data locality. > > +zone_reclaim may be enabled if it's known that the workload is partitioned > +such that each partition fits within a NUMA node and that accessing remote > +memory would cause a measurable performance reduction. The page allocator > +will then reclaim easily reusable pages (those page cache pages that are > +currently not used) before allocating off node pages. > + > Allowing zone reclaim to write out pages stops processes that are > writing large amounts of data from dirtying pages on other nodes. Zone > reclaim will write out dirty pages if a zone fills up and so effectively > diff -puN arch/ia64/include/asm/topology.h~mm-disable-zone_reclaim_mode-by-default arch/ia64/include/asm/topology.h > --- a/arch/ia64/include/asm/topology.h~mm-disable-zone_reclaim_mode-by-default > +++ a/arch/ia64/include/asm/topology.h > @@ -21,7 +21,8 @@ > #define PENALTY_FOR_NODE_WITH_CPUS 255 > > /* > - * Distance above which we begin to use zone reclaim > + * Nodes within this distance are eligible for reclaim by zone_reclaim() when > + * zone_reclaim_mode is enabled. > */ > #define RECLAIM_DISTANCE 15 > > diff -puN arch/powerpc/include/asm/topology.h~mm-disable-zone_reclaim_mode-by-default arch/powerpc/include/asm/topology.h > --- a/arch/powerpc/include/asm/topology.h~mm-disable-zone_reclaim_mode-by-default > +++ a/arch/powerpc/include/asm/topology.h > @@ -9,12 +9,8 @@ struct device_node; > #ifdef CONFIG_NUMA > > /* > - * Before going off node we want the VM to try and reclaim from the local > - * node. It does this if the remote distance is larger than RECLAIM_DISTANCE. > - * With the default REMOTE_DISTANCE of 20 and the default RECLAIM_DISTANCE of > - * 20, we never reclaim and go off node straight away. > - * > - * To fix this we choose a smaller value of RECLAIM_DISTANCE. > + * If zone_reclaim_mode is enabled, a RECLAIM_DISTANCE of 10 will mean that > + * all zones on all nodes will be eligible for zone_reclaim(). > */ > #define RECLAIM_DISTANCE 10 > > diff -puN include/linux/topology.h~mm-disable-zone_reclaim_mode-by-default include/linux/topology.h > --- a/include/linux/topology.h~mm-disable-zone_reclaim_mode-by-default > +++ a/include/linux/topology.h > @@ -58,7 +58,8 @@ int arch_update_cpu_topology(void); > /* > * If the distance between nodes in a system is larger than RECLAIM_DISTANCE > * (in whatever arch specific measurement units returned by node_distance()) > - * then switch on zone reclaim on boot. > + * and zone_reclaim_mode is enabled then the VM will only call zone_reclaim() > + * on nodes within this distance. > */ > #define RECLAIM_DISTANCE 30 > #endif > diff -puN mm/page_alloc.c~mm-disable-zone_reclaim_mode-by-default mm/page_alloc.c > --- a/mm/page_alloc.c~mm-disable-zone_reclaim_mode-by-default > +++ a/mm/page_alloc.c > @@ -1860,8 +1860,6 @@ static void __paginginit init_zone_allow > for_each_node_state(i, N_MEMORY) > if (node_distance(nid, i) <= RECLAIM_DISTANCE) > node_set(i, NODE_DATA(nid)->reclaim_nodes); > - else > - zone_reclaim_mode = 1; > } > > #else /* CONFIG_NUMA */ > _ > > Patches currently in -mm which might be from mgorman@xxxxxxx are > > mm-use-paravirt-friendly-ops-for-numa-hinting-ptes.patch > thp-close-race-between-split-and-zap-huge-pages.patch > x86-require-x86-64-for-automatic-numa-balancing.patch > x86-define-_page_numa-by-reusing-software-bits-on-the-pmd-and-pte-levels.patch > x86-define-_page_numa-by-reusing-software-bits-on-the-pmd-and-pte-levels-fix-2.patch > mm-introduce-do_shared_fault-and-drop-do_fault-fix-fix.patch > mm-compactionc-isolate_freepages_block-small-tuneup.patch > mm-only-force-scan-in-reclaim-when-none-of-the-lrus-are-big-enough.patch > mm-huge_memoryc-complete-conversion-to-pr_foo.patch > mm-disable-zone_reclaim_mode-by-default.patch > mm-page_alloc-do-not-cache-reclaim-distances.patch > do_shared_fault-check-that-mmap_sem-is-held.patch > linux-next.patch > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>