On 11/24/2017 11:14 AM, Michal Hocko wrote: > On Fri 24-11-17 11:07:07, Peter Enderborg wrote: >> When tuning the watermark_scale_factor to reduce stalls and compactions >> the high mark is also changed, it changed a bit too much. So this >> patch introduces a slope that can reduce this overhead a bit, or >> increase it if needed. > > This doesn't explain what is the problem, why it is a problem and why we > need yet another tuning to address it. Users shouldn't really care about > internal stuff like watermark tuning for each watermark independently. > This looks like a gross hack. Please start over with the problem > description and then we can move on to an approapriate fix. Piling up > tuning knobs to workaround problems is simply not acceptable. Agreed. Also if you send a patch adding userspace API or a tuning knob, please CC linux-api mailing list (did that for this reply). >> Signed-off-by: Peter Enderborg <peter.enderborg@xxxxxxxx> >> --- >> Documentation/sysctl/vm.txt | 15 +++++++++++++++ >> include/linux/mm.h | 1 + >> include/linux/mmzone.h | 2 ++ >> kernel/sysctl.c | 9 +++++++++ >> mm/page_alloc.c | 6 +++++- >> 5 files changed, 32 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt >> index eda628c..aecff6c 100644 >> --- a/Documentation/sysctl/vm.txt >> +++ b/Documentation/sysctl/vm.txt >> @@ -62,6 +62,7 @@ Currently, these files are in /proc/sys/vm: >> - user_reserve_kbytes >> - vfs_cache_pressure >> - watermark_scale_factor >> +- watermark_high_factor_slope >> - zone_reclaim_mode >> >> ============================================================== >> @@ -857,6 +858,20 @@ that the number of free pages kswapd maintains for latency reasons is >> too small for the allocation bursts occurring in the system. This knob >> can then be used to tune kswapd aggressiveness accordingly. >> >> +============================================================= >> + >> +watermark_high_factor_slope: >> + >> +This factor is high mark for watermark_scale_factor. >> +The unit is in percent. >> +Max value is 1000 and min value is 100. (High watermark is the same as >> +low water mark) Low watermark is min_wmark_pages + watermark_scale_factor. >> +and high watermark is >> +min_wmark_pages+(watermark_scale_factor * watermark_high_factor_slope). >> + >> +The default value is 200. >> + >> + >> ============================================================== >> >> zone_reclaim_mode: >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 7661156..c89536b 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -2094,6 +2094,7 @@ extern void zone_pcp_reset(struct zone *zone); >> /* page_alloc.c */ >> extern int min_free_kbytes; >> extern int watermark_scale_factor; >> +extern int watermark_high_factor_slope; >> >> /* nommu.c */ >> extern atomic_long_t mmap_pages_allocated; >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 67f2e3c..91bf842 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -886,6 +886,8 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int, >> void __user *, size_t *, loff_t *); >> int watermark_scale_factor_sysctl_handler(struct ctl_table *, int, >> void __user *, size_t *, loff_t *); >> +//int watermark_high_factor_tilt_sysctl_handler(struct ctl_table *, int, >> +// void __user *, size_t *, loff_t *); >> extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1]; >> int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, >> void __user *, size_t *, loff_t *); >> diff --git a/kernel/sysctl.c b/kernel/sysctl.c >> index 2fb4e27..83c48c9 100644 >> --- a/kernel/sysctl.c >> +++ b/kernel/sysctl.c >> @@ -1444,6 +1444,15 @@ static struct ctl_table vm_table[] = { >> .extra2 = &one_thousand, >> }, >> { >> + .procname = "watermark_high_factor_slope", >> + .data = &watermark_high_factor_slope, >> + .maxlen = sizeof(watermark_high_factor_slope), >> + .mode = 0644, >> + .proc_handler = watermark_scale_factor_sysctl_handler, >> + .extra1 = &one_hundred, >> + .extra2 = &one_thousand, >> + }, >> + { >> .procname = "percpu_pagelist_fraction", >> .data = &percpu_pagelist_fraction, >> .maxlen = sizeof(percpu_pagelist_fraction), >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 48b5b01..3dc50ff 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -263,6 +263,7 @@ compound_page_dtor * const compound_page_dtors[] = { >> int min_free_kbytes = 1024; >> int user_min_free_kbytes = -1; >> int watermark_scale_factor = 10; >> +int watermark_high_factor_slope = 200; >> >> static unsigned long __meminitdata nr_kernel_pages; >> static unsigned long __meminitdata nr_all_pages; >> @@ -6989,6 +6990,7 @@ static void __setup_per_zone_wmarks(void) >> >> for_each_zone(zone) { >> u64 tmp; >> + u64 tmp_high; >> >> spin_lock_irqsave(&zone->lock, flags); >> tmp = (u64)pages_min * zone->managed_pages; >> @@ -7026,7 +7028,9 @@ static void __setup_per_zone_wmarks(void) >> watermark_scale_factor, 10000)); >> >> zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp; >> - zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2; >> + tmp_high = mult_frac(tmp, watermark_high_factor_slope, 100); >> + zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp_high; >> + >> >> spin_unlock_irqrestore(&zone->lock, flags); >> } >> -- >> 2.7.4 >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>