Tejun describes the problem as follows: When swap runs out, there's an abrupt change in system behavior - the anonymous memory suddenly becomes unmanageable which readily breaks any sort of memory isolation and can bring down the whole system. To avoid that, oomd [1] monitors free swap space and triggers kills when it drops below the specific threshold (e.g. 15%). While this works, it's far from ideal: - Depending on IO performance and total swap size, a given headroom might not be enough or too much. - oomd has to monitor swap depletion in addition to the usual pressure metrics and it currently doesn't consider memory.swap.max. Solve this by adapting parts of the approach that memory.high uses - slow down allocation as the resource gets depleted turning the depletion behavior from abrupt cliff one to gradual degradation observable through memory pressure metric. [1] https://github.com/facebookincubator/oomd v2: https://lore.kernel.org/linux-mm/20200511225516.2431921-1-kuba@xxxxxxxxxx/ v1: https://lore.kernel.org/linux-mm/20200417010617.927266-1-kuba@xxxxxxxxxx/ Jakub Kicinski (3): mm: prepare for swap over-high accounting and penalty calculation mm: move penalty delay clamping out of calculate_high_delay() mm: automatically penalize tasks with high swap use Documentation/admin-guide/cgroup-v2.rst | 20 +++ include/linux/memcontrol.h | 4 + mm/memcontrol.c | 159 ++++++++++++++++++------ 3 files changed, 143 insertions(+), 40 deletions(-) -- 2.25.4