RE: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, correcting a small typo error below.

Please review and provide your comments.
This is the version2 of the previous patch.

> -----Original Message-----
> From: Pintu Kumar [mailto:pintu.k@xxxxxxxxxxx]
> Sent: Friday, July 17, 2015 12:00 PM
> To: akpm@xxxxxxxxxxxxxxxxxxxx; corbet@xxxxxxx; vbabka@xxxxxxx;
> gorcunov@xxxxxxxxxx; pintu.k@xxxxxxxxxxx; mhocko@xxxxxxx;
> emunson@xxxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx;
> standby24x7@xxxxxxxxx; hannes@xxxxxxxxxxx; vdavydov@xxxxxxxxxxxxx;
> hughd@xxxxxxxxxx; minchan@xxxxxxxxxx; tj@xxxxxxxxxx; rientjes@xxxxxxxxxx;
> xypron.glpk@xxxxxx; dzickus@xxxxxxxxxx; prarit@xxxxxxxxxx;
> ebiederm@xxxxxxxxxxxx; rostedt@xxxxxxxxxxx; uobergfe@xxxxxxxxxx;
> paulmck@xxxxxxxxxxxxxxxxxx; iamjoonsoo.kim@xxxxxxx; ddstreet@xxxxxxxx;
> sasha.levin@xxxxxxxxxx; koct9i@xxxxxxxxx; mgorman@xxxxxxx; cj@xxxxxxxxx;
> opensource.ganesh@xxxxxxxxx; vinmenon@xxxxxxxxxxxxxx; linux-
> doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-
> pm@xxxxxxxxxxxxxxx; qiuxishi@xxxxxxxxxx; Valdis.Kletnieks@xxxxxx
> Cc: cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; vishnu.ps@xxxxxxxxxxx;
> rohit.kr@xxxxxxxxxxx; iqbal.ams@xxxxxxxxxxx; pintu.ping@xxxxxxxxx;
> pintu.k@xxxxxxxxxxx
> Subject: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
> 
> This patch provides 2 things:
> 1. Add new control called shrink_memory in /proc/sys/vm/.
> This control can be used to aggressively reclaim memory system-wide in one
shot
> from the user space. A value of 1 will instruct the kernel to reclaim as much
as
> totalram_pages in the system.
> Example: echo 1 > /proc/sys/vm/shrink_memory
> 
> If any other value than 1 is written to shrink_memory an error EINVAL occurs.
> 
> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> Currently, shrink_all_memory function is used only during hibernation.
> With the new config we can make use of this API for non-hibernation case also
> without disturbing the hibernation case.
> 
> The detailed paper was presented in Embedded Linux Conference, Mar-2015
> http://events.linuxfoundation.org/sites/events/files/slides/
> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> 
> A sample example is shown below:
> Device: ARMv7, Dual Core CPU 1.2GHz
> RAM: 512MB (Without SWAP/ZRAM)
> Linux Kernel: 3.10.17
> Scenario: Just after boot-up finished.
> 
> BEFORE:
> -------------------------------------------------------------------------
> shell> free -tm ; cat /proc/buddyinfo
>              total       used       free     shared    buffers     cached
> Mem:           460        440         20          0         35        154
> -/+ buffers/cache:        250        209
> Swap:            0          0          0
> Total:         460        440         20
> Node 0, zone   Normal   1037    705     92     19     19     17      4      9
0      0      0
> 
> shell> vmstat 1 &
> 
> AFTER:
> -------------------------------------------------------------------------
> shell> echo 1 > /proc/sys/vm/shrink_memory
> 
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
st
>  0  0      0  20768  35876 157876    0    0     0     0   64  177  0  1 99  0
0
>
--------------------------------------------------------------------------------
> |1  0      0  33104  34864 149808    0    0     0     0   82  221  0 12 88  0
0|
>
--------------------------------------------------------------------------------
>  0  0      0 188776   3000  54420    0    0     0     0  216  374  0 30 70  0
0
>  0  0      0 188400   3652  54528    0    0   740     8  188  337  2  1 95  2
0
> 
> shell> free -tm ; cat /proc/buddyinfo
>              total       used       free     shared    buffers     cached
> Mem:           460        278        182          0          4         54
> -/+ buffers/cache:        219        240
> Swap:            0          0          0
> Total:         460        278        182
> Node 0, zone   Normal   5575   3158   1500    727    240     90     33     18
10      6
> 6
> 
> RESULTS:
> -----------------------------------------------------
> Around 160MB of memory were recovered in one shot.
> Many higher-order pages were recovered in the process.
> From the vmstat output the total CPU usage is: ~12% (system), when this
> command is running, for 1 second.
> We also measured the power consumption using H/W power monitor tool.
> Below is the result:
> Before - ~180mA
> During shrink memory - ~237mA
> Duration - ~0.5 sec
> Consumption: ~57mA
> 
> FURTHER OBSERVATIONS:
> -----------------------------------------------------
> 37% reduction in killing of application with memory shrink calling on boot up.
> Around ~4000 page faults are reduced.
> Around ~43% of reduction in kswapd calls.
> Movement to slowpath reduced dractically.
> Combining shrink_memory with compaction shows good benefits over
> fragmentation.
> 
> APPLICATION LAUNCH BEHAVIOR:
> -----------------------------------------------------
> During First Launch:
> ==================================================================
> ==========
> Application	Before_shrink_memory	After_shrink_memory	Difference
> Camera		1.981			1.86			0.121
> Gallery		1.276			0.94			0.336
> contacts	1.112			0.941			0.171
> messaging	0.886			0.795			0.091
> settings	1.257			1.212			0.045
> Music		1.854			2.098			-0.244
> Gmail		1.872			1.935			-0.063
> Browser		2.569			2.677			-0.108
> ==================================================================
> ==========
> 
> During Re-launch:
> ==================================================================
> ==========
> Application	Before_shrink_memory	After_shrink_memory	Difference
> Camera		1.248			0.976			0.272
> Gallery		0.697			0.633			0.064
> contacts	0.506			0.561			-0.055
> messaging	0.533			0.489			0.044
> settings	0.833			0.805			0.028
> Music		0.832			0.769			0.063
> Gmail		0.913			0.841			0.072
> Browser		0.579			0.57			0.009
> ==================================================================
> ==========
> 
> Various other use cases where this can be used:
> ----------------------------------------------------------------------------
> 1) Just after system boot-up is finished, using the sysctl configuration from
>    bootup script.
> 2) During system suspend state, after suspend_freeze_processes()
>    [kernel/power/suspend.c]
>    Based on certain condition about fragmentation or free memory state.
> 3) From Android ION system heap driver, when order-4 allocation starts
failing.
>    By calling shrink_all_memory, in a separate worker thread, based on certain
>    condition.
> 4) It can be combined with compact_memory to achieve better results on
> memory
>    fragmentation.
> 5) It can be helpful in debugging and tuning various vm parameters.
> 6) It can be helpful to identify how much of maximum memory could be
>    reclaimable at any point of time.
>    And how much higher-order pages could be formed with this amount of
>    reclaimable memory.
>    Thus it can be helpful in accordingly tuning the reserved memory needs
>    of a system.
> 7) It can be helpful in properly tuning the SWAP size in the system.
>    In shrink_all_memory, we enable may_swap = 1, that means all unused pages
>    will be swapped out.
>    Thus, running shrink_memory on a heavy loaded system, we can check how
> much
>    swap is getting full.
>    That can be the maximum swap size with a 10% delta.
>    Also if ZRAM is used, it helps us in compressing and storing the pages for
>    later use.
> 8) It can be helpful to allow more new applications to be launched, without
>    killing the older once.
>    And moving the least recently used pages to the SWAP area.
>    Thus user data can be retained.
> 9) Can be part of a system system-tool to quickly defragment entire system
>    memory.
> 10) This may also help in reducing fragmentation within CMA region.
> 11) More use cases can be identified.
> 
> Most importantly, it can be more effective when applied intelligently, based
on
> certain conditions.
> It should be executed always and the decision is left upto the user.

* It should _not_ be executed always. The decision is left to the user.

> 
> Signed-off-by: Pintu Kumar <pintu.k@xxxxxxxxxxx>
> ---
> V2: Added min,max parameter for shrink_memory, suggested by
>     Heinrich Schuchardt <xypron.glpk@xxxxxx>.
>     Error handling in sysctl_shrinkmem_handler, for any value other than 1,
>     suggested by, Heinrich Schuchardt <xypron.glpk@xxxxxx>.
>     Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory,
>     suggested by Valdis.Kletnieks@xxxxxx.
>     Restore gfp_mask to original, because of other dependencies.
>     Also adding GFP_RECLAIM_MASK, does not affect anything.
>     Verified power consumption data during shrink_memory,
>     as suggested by Johannes Weiner <hannes@xxxxxxxxxxx>.
>     Verified application launch/re-launch scenarios before/after
shrink_memory,
>     as suggested by Xishi Qiu <qiuxishi@xxxxxxxxxx>.
>     Updates the commit messages with examples and use cases.
> 
>  Documentation/sysctl/vm.txt |   18 ++++++++++++++++++
>  include/linux/swap.h        |    7 +++++++
>  kernel/sysctl.c             |   16 ++++++++++++++++
>  mm/Kconfig                  |    8 ++++++++
>  mm/vmscan.c                 |   34 ++++++++++++++++++++++++++++++++--
>  5 files changed, 81 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index
> 9832ec5..54eda3a 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
>  - page-cluster
>  - panic_on_oom
>  - percpu_pagelist_fraction
> +- shrink_memory
>  - stat_interval
>  - swappiness
>  - user_reserve_kbytes
> @@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior.
> 
>  ==============================================================
> 
> +shrink_memory
> +
> +This control is available only when CONFIG_SHRINK_MEMORY is set. This
> +control can be used to aggressively reclaim memory system-wide in one
> +shot. A value of
> +1 will instruct the kernel to reclaim as much as totalram_pages in the
system.
> +For example, to reclaim all memory system-wide we can do:
> +# echo 1 > /proc/sys/vm/shrink_memory
> +
> +If any other value than 1 is written to shrink_memory an error EINVAL occurs.
> +
> +For more information about this control, please visit the following
> +presentation in embedded linux conference, 2015.
> +http://events.linuxfoundation.org/sites/events/files/slides/
> +%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> +
> +==============================================================
> +
>  stat_interval
> 
>  The time interval between which vm statistics are updated.  The default diff
--git
> a/include/linux/swap.h b/include/linux/swap.h index 9a7adfb..6505b0b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -333,6 +333,13 @@ extern int vm_swappiness;  extern int
> remove_mapping(struct address_space *mapping, struct page *page);  extern
> unsigned long vm_total_pages;
> 
> +#ifdef CONFIG_SHRINK_MEMORY
> +extern int sysctl_shrink_memory;
> +extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> +		void __user *buffer, size_t *length, loff_t *ppos); #endif
> +
> +
>  #ifdef CONFIG_NUMA
>  extern int zone_reclaim_mode;
>  extern int sysctl_min_unmapped_ratio;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c566b56..e66581b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -275,6 +275,11 @@ static int min_extfrag_threshold;  static int
> max_extfrag_threshold = 1000;  #endif
> 
> +#ifdef CONFIG_SHRINK_MEMORY
> +static int min_shrink_memory = 1;
> +static int max_shrink_memory = 1;
> +#endif
> +
>  static struct ctl_table kern_table[] = {
>  	{
>  		.procname	= "sched_child_runs_first",
> @@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = {
>  	},
> 
>  #endif /* CONFIG_COMPACTION */
> +#ifdef CONFIG_SHRINK_MEMORY
> +	{
> +		.procname	= "shrink_memory",
> +		.data		= &sysctl_shrink_memory,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0200,
> +		.proc_handler	= sysctl_shrinkmem_handler,
> +		.extra1         = &min_shrink_memory,
> +		.extra2         = &max_shrink_memory,
> +	},
> +#endif
>  	{
>  		.procname	= "min_free_kbytes",
>  		.data		= &min_free_kbytes,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index b3a60ee..8e04bd9 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
>  	  when kswapd starts. This has a potential performance impact on
>  	  processes running early in the lifetime of the systemm until kswapd
>  	  finishes the initialisation.
> +
> +config SHRINK_MEMORY
> +	bool "Allow for system-wide shrinking of memory"
> +	default n
> +	depends on MMU
> +	help
> +	  It enables support for system-wide memory reclaim in one shot using
> +	  echo 1 > /proc/sys/vm/shrink_memory.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c8d8282..e802fa7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -58,6 +58,10 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/vmscan.h>
> 
> +#ifdef CONFIG_SHRINK_MEMORY
> +#include <linux/suspend.h>
> +#endif
> +
>  struct scan_control {
>  	/* How many pages shrink_list() should reclaim */
>  	unsigned long nr_to_reclaim;
> @@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order,
> enum zone_type classzone_idx)
>  	wake_up_interruptible(&pgdat->kswapd_wait);
>  }
> 
> -#ifdef CONFIG_HIBERNATION
> +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
>  /*
>   * Try to free `nr_to_reclaim' of memory, system-wide, and return the number
of
>   * freed pages.
> @@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long
> nr_to_reclaim)
>  		.may_writepage = 1,
>  		.may_unmap = 1,
>  		.may_swap = 1,
> -		.hibernation_mode = 1,
>  	};
>  	struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
>  	struct task_struct *p = current;
>  	unsigned long nr_reclaimed;
> 
> +	if (system_entering_hibernation())
> +		sc.hibernation_mode = 1;
> +	else
> +		sc.hibernation_mode = 0;
> +
>  	p->flags |= PF_MEMALLOC;
>  	lockdep_set_current_reclaim_state(sc.gfp_mask);
>  	reclaim_state.reclaimed_slab = 0;
> @@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long
> nr_to_reclaim)  }  #endif /* CONFIG_HIBERNATION */
> 
> +#ifdef CONFIG_SHRINK_MEMORY
> +int sysctl_shrink_memory;
> +/* This is the entry point for system-wide shrink memory
> ++via /proc/sys/vm/shrink_memory */
> +int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> +		void __user *buffer, size_t *length, loff_t *ppos) {
> +	int ret;
> +
> +	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
> +	if (ret)
> +		return ret;
> +
> +	if (write) {
> +		if (sysctl_shrink_memory & 1)
> +			shrink_all_memory(totalram_pages);
> +	}
> +
> +	return 0;
> +}
> +#endif
> +
>  /* It's optimal to keep kswapds on the same CPUs as their memory, but
>     not required for correctness.  So if the last cpu in a node goes
>     away, we get changed to run anywhere: as the first one comes back,
> --
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux