On Wed, 11 Mar 2020, David Rientjes wrote: > On Wed, 11 Mar 2020, Ivan Teterevkov wrote: > > > This patch adds a couple of knobs: > > > > - The configuration option (CONFIG_VM_SWAPPINESS). > > - The command line parameter (vm_swappiness). > > > > The default value is preserved, but now defined by CONFIG_VM_SWAPPINESS. > > > > Historically, the default swappiness is set to the well-known value 60, > > and this works well for the majority of cases. The vm_swappiness is also > > exposed as the kernel parameter that can be changed at runtime too, e.g. > > with sysctl. > > > > This approach might not suit well some configurations, e.g. systemd-based > > distros, where systemd is put in charge of the cgroup controllers, > > including the memory one. In such cases, the default swappiness 60 > > is copied across the cgroup subtrees early at startup, when systemd > > is arranging the slices for its services, before the sysctl.conf > > or tmpfiles.d/*.conf changes are applied. > > > > Seems like something that can be fully handled by an initscript that would > set the sysctl and then iterate the memcg hierarchy propagating the > non-default value. I don't think that's too much of an ask if userspace > wants to manipulate the swappiness value. > This is exactly what I'm trying to avoid: in some distros there is no way to tackle the configuration early enough, e.g. in systemd-based systems the systemd is the process that starts first and arranges memcg in a way it's configured, but unfortunately, it doesn't offer the swappiness knob. There could be a script to iterate the memcg later, but there would be a race condition with the system entity that's put in charge of the memcg because the configuration can't be changed atomically, e.g. a possible script could iterate the memcg tree and update each memory.swappiness while systemd is creating another slice or scope subtree. > Or maybe we can be more clever: have memcg->swappiness store -1 by default > unless it is changed by the user explicitly and then have > mem_cgroup_swappiness() return vm_swappiness for this value. If the user > overwrites it, it's intended. > Does it mean that -1 would become a reference to the vm_swappiness or the parent's memory.swappiness? It sounds interesting and if so then it would address my issues with the swappiness but would also change the existing memcg behaviour: if the referred-to value changed, would the memory.swappiness backed by -1 also change? > So there are a couple options here but I don't think one of them is to add > a new config option or kernel command line option. > The vm_swappiness starts its lifespan in the kernel and thus why not to facilitate it with a simple "constructor" there? > > One could run a script to traverse the cgroup trees later and set the > > desired memory.swappiness individually in each occurrence when the runtime > > is set up, but this would require some amount of work to implement > > properly. Instead, why not set the default swappiness as early as possible? > > > > Signed-off-by: Ivan Teterevkov <ivan.teterevkov@xxxxxxxxxxx> > > --- > > .../admin-guide/kernel-parameters.txt | 4 ++++ > > mm/Kconfig | 10 ++++++++ > > mm/vmscan.c | 24 ++++++++++++++++++- > > 3 files changed, 37 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > > index c07815d230bc..5d54a4303522 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -5317,6 +5317,10 @@ > > P Enable page structure init time poisoning > > - Disable all of the above options > > > > + vm_swappiness= [KNL] > > + Sets the default vm_swappiness. > > + Ranges from 0 to 100, the default value is 60. > > + > > vmalloc=nn[KMG] [KNL,BOOT] Forces the vmalloc area to have an > exact > > size of <nn>. This can be used to increase the > > minimum size (128MB on x86). It can also be used to > diff --git a/mm/Kconfig b/mm/Kconfig index ab80933be65f..ec59c19e578e > 100644 > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -739,4 +739,14 @@ config ARCH_HAS_HUGEPD config > MAPPING_DIRTY_HELPERS > > bool > > > > +config VM_SWAPPINESS > > + int "Default memory swappiness" > > + default 60 > > + range 0 100 > > + help > > + Sets the default vm_swappiness, that could be changed later > > + in the runtime, e.g. kernel command line, sysctl, etc. > > + > > + Higher value means more swappy. Historically, defaults to 60. > > + > > endmenu > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 876370565455..7d2d3550f698 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -163,7 +163,29 @@ struct scan_control { > > /* > > * From 0 .. 100. Higher means more swappy. > > */ > > -int vm_swappiness = 60; > > +int vm_swappiness = CONFIG_VM_SWAPPINESS; > > + > > +static int __init swappiness_cmdline(char *str) { > > + int val, err; > > + > > + if (!str) > > + return -EINVAL; > > + > > + err = kstrtoint(str, 10, &val); > > + if (err) > > + return -EINVAL; > > + > > + if (val < 0 || val > 100) > > + return -EINVAL; > > + > > + vm_swappiness = val; > > + > > + return 0; > > +} > > + > > +early_param("vm_swappiness", swappiness_cmdline); > > + > > /* > > * The total number of pages which are beyond the high watermark within all > > * zones.