- zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o.patch removed from -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled

     ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O

has been removed from the -mm tree.  Its filename is

     zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o.patch

This patch was dropped because it was merged into mainline or a subsystem tree

------------------------------------------------------
Subject: ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O
From: Christoph Lameter <clameter@xxxxxxx>

It turns out that it is advantageous to leave a small portion of unmapped file
backed pages if all of a zone's pages (or almost all pages) are allocated and
so the page allocator has to go off-node.

This allows recently used file I/O buffers to stay on the node and
reduces the times that zone reclaim is invoked if file I/O occurs
when we run out of memory in a zone.

The problem is that zone reclaim runs too frequently when the page cache is
used for file I/O (read write and therefore unmapped pages!) alone and we have
almost all pages of the zone allocated.  Zone reclaim may remove 32 unmapped
pages.  File I/O will use these pages for the next read/write requests and the
unmapped pages increase.  After the zone has filled up again zone reclaim will
remove it again after only 32 pages.  This cycle is too inefficient and there
are potentially too many zone reclaim cycles.

With the 1% boundary we may still remove all unmapped pages for file I/O in
zone reclaim pass.  However.  it will take a large number of read and writes
to get back to 1% again where we trigger zone reclaim again.

The zone reclaim 2.6.16/17 does not show this behavior because we have a 30
second timeout.

[akpm@xxxxxxxx: rename the /proc file and the variable]
Signed-off-by: Christoph Lameter <clameter@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---

 Documentation/sysctl/vm.txt |   14 ++++++++++++++
 include/linux/mmzone.h      |    6 ++++++
 include/linux/swap.h        |    1 +
 include/linux/sysctl.h      |    2 +-
 kernel/sysctl.c             |   11 +++++++++++
 mm/page_alloc.c             |   22 ++++++++++++++++++++++
 mm/vmscan.c                 |   27 ++++++++++++++-------------
 7 files changed, 69 insertions(+), 14 deletions(-)

diff -puN Documentation/sysctl/vm.txt~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o Documentation/sysctl/vm.txt
--- a/Documentation/sysctl/vm.txt~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o
+++ a/Documentation/sysctl/vm.txt
@@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/
 - block_dump
 - drop-caches
 - zone_reclaim_mode
+- min_unmapped_ratio
 - panic_on_oom
 
 ==============================================================
@@ -168,6 +169,19 @@ in all nodes of the system.
 
 =============================================================
 
+min_unmapped_ratio:
+
+This is available only on NUMA kernels.
+
+A percentage of the file backed pages in each zone.  Zone reclaim will only
+occur if more than this percentage of pages are file backed and unmapped.
+This is to insure that a minimal amount of local pages is still available for
+file I/O even if the node is overallocated.
+
+The default is 1 percent.
+
+=============================================================
+
 panic_on_oom
 
 This enables or disables panic on out-of-memory feature.  If this is set to 1,
diff -puN include/linux/mmzone.h~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o include/linux/mmzone.h
--- a/include/linux/mmzone.h~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o
+++ a/include/linux/mmzone.h
@@ -150,6 +150,10 @@ struct zone {
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
 #ifdef CONFIG_NUMA
+	/*
+	 * zone reclaim becomes active if more unmapped pages exist.
+	 */
+	unsigned long		min_unmapped_ratio;
 	struct per_cpu_pageset	*pageset[NR_CPUS];
 #else
 	struct per_cpu_pageset	pageset[NR_CPUS];
@@ -414,6 +418,8 @@ int lowmem_reserve_ratio_sysctl_handler(
 					void __user *, size_t *, loff_t *);
 int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int, struct file *,
 					void __user *, size_t *, loff_t *);
+int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
+			struct file *, void __user *, size_t *, loff_t *);
 
 #include <linux/topology.h>
 /* Returns the number of the current Node. */
diff -puN include/linux/swap.h~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o include/linux/swap.h
--- a/include/linux/swap.h~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o
+++ a/include/linux/swap.h
@@ -189,6 +189,7 @@ extern long vm_total_pages;
 
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
+extern int sysctl_min_unmapped_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #else
 #define zone_reclaim_mode 0
diff -puN include/linux/sysctl.h~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o include/linux/sysctl.h
--- a/include/linux/sysctl.h~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o
+++ a/include/linux/sysctl.h
@@ -188,7 +188,7 @@ enum
 	VM_DROP_PAGECACHE=29,	/* int: nuke lots of pagecache */
 	VM_PERCPU_PAGELIST_FRACTION=30,/* int: fraction of pages in each percpu_pagelist */
 	VM_ZONE_RECLAIM_MODE=31, /* reclaim local zone memory before going off node */
-	VM_ZONE_RECLAIM_INTERVAL=32, /* time period to wait after reclaim failure */
+	VM_MIN_UNMAPPED=32,	/* Set min percent of unmapped pages */
 	VM_PANIC_ON_OOM=33,	/* panic at out-of-memory */
 	VM_VDSO_ENABLED=34,	/* map VDSO into new processes? */
 };
diff -puN kernel/sysctl.c~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o kernel/sysctl.c
--- a/kernel/sysctl.c~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o
+++ a/kernel/sysctl.c
@@ -932,6 +932,17 @@ static ctl_table vm_table[] = {
 		.strategy	= &sysctl_intvec,
 		.extra1		= &zero,
 	},
+	{
+		.ctl_name	= VM_MIN_UNMAPPED,
+		.procname	= "min_unmapped_ratio",
+		.data		= &sysctl_min_unmapped_ratio,
+		.maxlen		= sizeof(sysctl_min_unmapped_ratio),
+		.mode		= 0644,
+		.proc_handler	= &sysctl_min_unmapped_ratio_sysctl_handler,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+		.extra2		= &one_hundred,
+	},
 #endif
 #ifdef CONFIG_X86_32
 	{
diff -puN mm/page_alloc.c~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o mm/page_alloc.c
--- a/mm/page_alloc.c~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o
+++ a/mm/page_alloc.c
@@ -2005,6 +2005,10 @@ static void __meminit free_area_init_cor
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
+#ifdef CONFIG_NUMA
+		zone->min_unmapped_ratio = (realsize*sysctl_min_unmapped_ratio)
+						/ 100;
+#endif
 		zone->name = zone_names[j];
 		spin_lock_init(&zone->lock);
 		spin_lock_init(&zone->lru_lock);
@@ -2298,6 +2302,24 @@ int min_free_kbytes_sysctl_handler(ctl_t
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
+int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
+	struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
+{
+	struct zone *zone;
+	int rc;
+
+	rc = proc_dointvec_minmax(table, write, file, buffer, length, ppos);
+	if (rc)
+		return rc;
+
+	for_each_zone(zone)
+		zone->min_unmapped_ratio = (zone->present_pages *
+				sysctl_min_unmapped_ratio) / 100;
+	return 0;
+}
+#endif
+
 /*
  * lowmem_reserve_ratio_sysctl_handler - just a wrapper around
  *	proc_dointvec() so that we can call setup_per_zone_lowmem_reserve()
diff -puN mm/vmscan.c~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o mm/vmscan.c
--- a/mm/vmscan.c~zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o
+++ a/mm/vmscan.c
@@ -1503,10 +1503,6 @@ module_init(kswapd_init)
  *
  * If non-zero call zone_reclaim when the number of free pages falls below
  * the watermarks.
- *
- * In the future we may add flags to the mode. However, the page allocator
- * should only have to check that zone_reclaim_mode != 0 before calling
- * zone_reclaim().
  */
 int zone_reclaim_mode __read_mostly;
 
@@ -1524,6 +1520,12 @@ int zone_reclaim_mode __read_mostly;
 #define ZONE_RECLAIM_PRIORITY 4
 
 /*
+ * Percentage of pages in a zone that must be unmapped for zone_reclaim to
+ * occur.
+ */
+int sysctl_min_unmapped_ratio = 1;
+
+/*
  * Try to free up some pages from this zone through reclaim.
  */
 static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
@@ -1590,18 +1592,17 @@ int zone_reclaim(struct zone *zone, gfp_
 	int node_id;
 
 	/*
-	 * Do not reclaim if there are not enough reclaimable pages in this
-	 * zone that would satify this allocations.
+	 * Zone reclaim reclaims unmapped file backed pages.
 	 *
-	 * All unmapped pagecache pages are reclaimable.
-	 *
-	 * Both counters may be temporarily off a bit so we use
-	 * SWAP_CLUSTER_MAX as the boundary. It may also be good to
-	 * leave a few frequently used unmapped pagecache pages around.
+	 * A small portion of unmapped file backed pages is needed for
+	 * file I/O otherwise pages read by file I/O will be immediately
+	 * thrown out if the zone is overallocated. So we do not reclaim
+	 * if less than a specified percentage of the zone is used by
+	 * unmapped file backed pages.
 	 */
 	if (zone_page_state(zone, NR_FILE_PAGES) -
-		zone_page_state(zone, NR_FILE_MAPPED) < SWAP_CLUSTER_MAX)
-			return 0;
+	    zone_page_state(zone, NR_FILE_MAPPED) <= zone->min_unmapped_ratio)
+		return 0;
 
 	/*
 	 * Avoid concurrent zone reclaims, do not reclaim in a zone that does
_

Patches currently in -mm which might be from clameter@xxxxxxx are

origin.patch

-
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux