This patch provides 2 things: 1. Add new control called shrink_memory in /proc/sys/vm/. This control can be used to aggressively reclaim memory system-wide in one shot from the user space. A value of 1 will instruct the kernel to reclaim as much as totalram_pages in the system. Example: echo 1 > /proc/sys/vm/shrink_memory If any other value than 1 is written to shrink_memory an error EINVAL occurs. 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY. Currently, shrink_all_memory function is used only during hibernation. With the new config we can make use of this API for non-hibernation case also without disturbing the hibernation case. The detailed paper was presented in Embedded Linux Conference, Mar-2015 http://events.linuxfoundation.org/sites/events/files/slides/ %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf A sample example is shown below: Device: ARMv7, Dual Core CPU 1.2GHz RAM: 512MB (Without SWAP/ZRAM) Linux Kernel: 3.10.17 Scenario: Just after boot-up finished. BEFORE: ------------------------------------------------------------------------- shell> free -tm ; cat /proc/buddyinfo total used free shared buffers cached Mem: 460 440 20 0 35 154 -/+ buffers/cache: 250 209 Swap: 0 0 0 Total: 460 440 20 Node 0, zone Normal 1037 705 92 19 19 17 4 9 0 0 0 shell> vmstat 1 & AFTER: ------------------------------------------------------------------------- shell> echo 1 > /proc/sys/vm/shrink_memory r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 20768 35876 157876 0 0 0 0 64 177 0 1 99 0 0 -------------------------------------------------------------------------------- |1 0 0 33104 34864 149808 0 0 0 0 82 221 0 12 88 0 0| -------------------------------------------------------------------------------- 0 0 0 188776 3000 54420 0 0 0 0 216 374 0 30 70 0 0 0 0 0 188400 3652 54528 0 0 740 8 188 337 2 1 95 2 0 shell> free -tm ; cat /proc/buddyinfo total used free shared buffers cached Mem: 460 278 182 0 4 54 -/+ buffers/cache: 219 240 Swap: 0 0 0 Total: 460 278 182 Node 0, zone Normal 5575 3158 1500 727 240 90 33 18 10 6 6 RESULTS: ----------------------------------------------------- Around 160MB of memory were recovered in one shot. Many higher-order pages were recovered in the process. >From the vmstat output the total CPU usage is: ~12% (system), when this command is running, for 1 second. We also measured the power consumption using H/W power monitor tool. Below is the result: Before - ~180mA During shrink memory - ~237mA Duration - ~0.5 sec Consumption: ~57mA FURTHER OBSERVATIONS: ----------------------------------------------------- 37% reduction in killing of application with memory shrink calling on boot up. Around ~4000 page faults are reduced. Around ~43% of reduction in kswapd calls. Movement to slowpath reduced dractically. Combining shrink_memory with compaction shows good benefits over fragmentation. APPLICATION LAUNCH BEHAVIOR: ----------------------------------------------------- During First Launch: ============================================================================ Application Before_shrink_memory After_shrink_memory Difference Camera 1.981 1.86 0.121 Gallery 1.276 0.94 0.336 contacts 1.112 0.941 0.171 messaging 0.886 0.795 0.091 settings 1.257 1.212 0.045 Music 1.854 2.098 -0.244 Gmail 1.872 1.935 -0.063 Browser 2.569 2.677 -0.108 ============================================================================ During Re-launch: ============================================================================ Application Before_shrink_memory After_shrink_memory Difference Camera 1.248 0.976 0.272 Gallery 0.697 0.633 0.064 contacts 0.506 0.561 -0.055 messaging 0.533 0.489 0.044 settings 0.833 0.805 0.028 Music 0.832 0.769 0.063 Gmail 0.913 0.841 0.072 Browser 0.579 0.57 0.009 ============================================================================ Various other use cases where this can be used: ---------------------------------------------------------------------------- 1) Just after system boot-up is finished, using the sysctl configuration from bootup script. 2) During system suspend state, after suspend_freeze_processes() [kernel/power/suspend.c] Based on certain condition about fragmentation or free memory state. 3) From Android ION system heap driver, when order-4 allocation starts failing. By calling shrink_all_memory, in a separate worker thread, based on certain condition. 4) It can be combined with compact_memory to achieve better results on memory fragmentation. 5) It can be helpful in debugging and tuning various vm parameters. 6) It can be helpful to identify how much of maximum memory could be reclaimable at any point of time. And how much higher-order pages could be formed with this amount of reclaimable memory. Thus it can be helpful in accordingly tuning the reserved memory needs of a system. 7) It can be helpful in properly tuning the SWAP size in the system. In shrink_all_memory, we enable may_swap = 1, that means all unused pages will be swapped out. Thus, running shrink_memory on a heavy loaded system, we can check how much swap is getting full. That can be the maximum swap size with a 10% delta. Also if ZRAM is used, it helps us in compressing and storing the pages for later use. 8) It can be helpful to allow more new applications to be launched, without killing the older once. And moving the least recently used pages to the SWAP area. Thus user data can be retained. 9) Can be part of a system utility to quickly defragment entire system memory. 10) This may also help in reducing fragmentation within CMA region. 11) More use cases can be identified. Most importantly, it can be more effective when applied intelligently, based on certain conditions. It should not be executed always and the decision is left upto the user. Signed-off-by: Pintu Kumar <pintu.k@xxxxxxxxxxx> --- V3: Correcting a small typo error at the end of commit message. V2: Added min,max parameter for shrink_memory, suggested by Heinrich Schuchardt <xypron.glpk@xxxxxx>. Error handling in sysctl_shrinkmem_handler, for any value other than 1, suggested by, Heinrich Schuchardt <xypron.glpk@xxxxxx>. Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory, suggested by Valdis.Kletnieks@xxxxxx. Restore gfp_mask to original, because of other dependencies. Also adding GFP_RECLAIM_MASK, does not affect anything. Verified power consumption data during shrink_memory, as suggested by Johannes Weiner <hannes@xxxxxxxxxxx>. Verified application launch/re-launch scenarios before/after shrink_memory, as suggested by Xishi Qiu <qiuxishi@xxxxxxxxxx>. Updates the commit messages with examples and use cases. Documentation/sysctl/vm.txt | 18 ++++++++++++++++++ include/linux/swap.h | 7 +++++++ kernel/sysctl.c | 16 ++++++++++++++++ mm/Kconfig | 8 ++++++++ mm/vmscan.c | 34 ++++++++++++++++++++++++++++++++-- 5 files changed, 81 insertions(+), 2 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 9832ec5..54eda3a 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm: - page-cluster - panic_on_oom - percpu_pagelist_fraction +- shrink_memory - stat_interval - swappiness - user_reserve_kbytes @@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior. ============================================================== +shrink_memory + +This control is available only when CONFIG_SHRINK_MEMORY is set. This control +can be used to aggressively reclaim memory system-wide in one shot. A value of +1 will instruct the kernel to reclaim as much as totalram_pages in the system. +For example, to reclaim all memory system-wide we can do: +# echo 1 > /proc/sys/vm/shrink_memory + +If any other value than 1 is written to shrink_memory an error EINVAL occurs. + +For more information about this control, please visit the following +presentation in embedded linux conference, 2015. +http://events.linuxfoundation.org/sites/events/files/slides/ +%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf + +============================================================== + stat_interval The time interval between which vm statistics are updated. The default diff --git a/include/linux/swap.h b/include/linux/swap.h index 9a7adfb..6505b0b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -333,6 +333,13 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +#ifdef CONFIG_SHRINK_MEMORY +extern int sysctl_shrink_memory; +extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *length, loff_t *ppos); +#endif + + #ifdef CONFIG_NUMA extern int zone_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c566b56..e66581b 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -275,6 +275,11 @@ static int min_extfrag_threshold; static int max_extfrag_threshold = 1000; #endif +#ifdef CONFIG_SHRINK_MEMORY +static int min_shrink_memory = 1; +static int max_shrink_memory = 1; +#endif + static struct ctl_table kern_table[] = { { .procname = "sched_child_runs_first", @@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = { }, #endif /* CONFIG_COMPACTION */ +#ifdef CONFIG_SHRINK_MEMORY + { + .procname = "shrink_memory", + .data = &sysctl_shrink_memory, + .maxlen = sizeof(int), + .mode = 0200, + .proc_handler = sysctl_shrinkmem_handler, + .extra1 = &min_shrink_memory, + .extra2 = &max_shrink_memory, + }, +#endif { .procname = "min_free_kbytes", .data = &min_free_kbytes, diff --git a/mm/Kconfig b/mm/Kconfig index b3a60ee..8e04bd9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT when kswapd starts. This has a potential performance impact on processes running early in the lifetime of the systemm until kswapd finishes the initialisation. + +config SHRINK_MEMORY + bool "Allow for system-wide shrinking of memory" + default n + depends on MMU + help + It enables support for system-wide memory reclaim in one shot using + echo 1 > /proc/sys/vm/shrink_memory. diff --git a/mm/vmscan.c b/mm/vmscan.c index c8d8282..e802fa7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -58,6 +58,10 @@ #define CREATE_TRACE_POINTS #include <trace/events/vmscan.h> +#ifdef CONFIG_SHRINK_MEMORY +#include <linux/suspend.h> +#endif + struct scan_control { /* How many pages shrink_list() should reclaim */ unsigned long nr_to_reclaim; @@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) wake_up_interruptible(&pgdat->kswapd_wait); } -#ifdef CONFIG_HIBERNATION +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY /* * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of * freed pages. @@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) .may_writepage = 1, .may_unmap = 1, .may_swap = 1, - .hibernation_mode = 1, }; struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask); struct task_struct *p = current; unsigned long nr_reclaimed; + if (system_entering_hibernation()) + sc.hibernation_mode = 1; + else + sc.hibernation_mode = 0; + p->flags |= PF_MEMALLOC; lockdep_set_current_reclaim_state(sc.gfp_mask); reclaim_state.reclaimed_slab = 0; @@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) } #endif /* CONFIG_HIBERNATION */ +#ifdef CONFIG_SHRINK_MEMORY +int sysctl_shrink_memory; +/* This is the entry point for system-wide shrink memory ++via /proc/sys/vm/shrink_memory */ +int sysctl_shrinkmem_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *length, loff_t *ppos) +{ + int ret; + + ret = proc_dointvec_minmax(table, write, buffer, length, ppos); + if (ret) + return ret; + + if (write) { + if (sysctl_shrink_memory & 1) + shrink_all_memory(totalram_pages); + } + + return 0; +} +#endif + /* It's optimal to keep kswapds on the same CPUs as their memory, but not required for correctness. So if the last cpu in a node goes away, we get changed to run anywhere: as the first one comes back, -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>