On Mon, 10 Feb 2020 11:01:21 -0800 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > The value of min_free_kbytes is calculated in two routines: > 1) init_per_zone_wmark_min based on available memory > 2) set_recommended_min_free_kbytes may reserve extra space for > THP allocations > > In both of these routines, a user defined min_free_kbytes value will > be overwritten if the value calculated in the code is larger. No message > is logged if the user value is overwritten. Could we provide a detailed description of why this is considered to be a problem? This is fairly easily guessable, but is there a real in-field bad user experience we can point at? > Change code to never overwrite user defined value. However, do log a > message (once per value) showing the value calculated in code. > > At system initialization time, both init_per_zone_wmark_min and > set_recommended_min_free_kbytes are called to set the initial value > for min_free_kbytes. When memory is offlined or onlined, min_free_kbytes > is recalculated and adjusted based on the amount of memory. However, > the adjustment for THP is not considered. Here is an example from a 2 > node system with 8GB of memory. > > # cat /proc/sys/vm/min_free_kbytes > 90112 > # echo 0 > /sys/devices/system/node/node1/memory56/online > # cat /proc/sys/vm/min_free_kbytes > 11243 > # echo 1 > /sys/devices/system/node/node1/memory56/online > # cat /proc/sys/vm/min_free_kbytes > 11412 > > One would expect that min_free_kbytes would return to it's original > value after the offline/online operations. > > Create a simple interface for THP/khugepaged based adjustment and > call this whenever min_free_kbytes is adjusted. > > ... > > include/linux/khugepaged.h | 5 ++++ > mm/internal.h | 2 ++ > mm/khugepaged.c | 56 ++++++++++++++++++++++++++++++++------ > mm/page_alloc.c | 35 ++++++++++++++++-------- min_free_kbytes gets a few mentions in Documentation/. Should we make the appropriate updates there to bring this behavior to people's attention?