On Fri 30-08-19 19:04:53, Tetsuo Handa wrote: > If /proc/sys/vm/oom_dump_tasks != 0, dump_header() can become very slow > because dump_tasks() synchronously reports all OOM victim candidates, and > as a result ratelimit test for dump_header() cannot work as expected. > > This patch defers dump_tasks() till oom_mutex is released. As a result of > this patch, the latency between out_of_memory() is called and SIGKILL is > sent (and the OOM reaper starts reclaiming memory) will be significantly > reduced. > > Since CONFIG_PRINTK_CALLER was introduced, concurrent printk() became less > problematic. But we still need to correlate synchronously printed messages > and asynchronously printed messages if we defer dump_tasks() messages. > Thus, this patch also prefixes OOM killer messages using "OOM[$serial]:" > format. As a result, OOM killer messages would look like below. > > [ 31.935015][ T71] OOM[1]: kworker/4:1 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=-1, oom_score_adj=0 > (...snipped....) > [ 32.052635][ T71] OOM[1]: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),global_oom,task_memcg=/,task=firewalld,pid=737,uid=0 > [ 32.056886][ T71] OOM[1]: Out of memory: Killed process 737 (firewalld) total-vm:358672kB, anon-rss:22640kB, file-rss:12328kB, shmem-rss:0kB, UID:0 pgtables:421888kB oom_score_adj:0 > [ 32.064291][ T71] OOM[1]: Tasks state (memory values in pages): > [ 32.067807][ T71] OOM[1]: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > [ 32.070057][ T54] oom_reaper: reaped process 737 (firewalld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > [ 32.072417][ T71] OOM[1]: [ 548] 0 548 9772 1172 110592 0 0 systemd-journal > (...snipped....) > [ 32.139566][ T71] OOM[1]: [ 737] 0 737 89668 8742 421888 0 0 firewalld > (...snipped....) > [ 32.221990][ T71] OOM[1]: [ 1300] 48 1300 63025 1788 532480 0 0 httpd > > This patch might affect panic behavior triggered by panic_on_oom or no > OOM-killable tasks, for dump_header(oc, NULL) will not report OOM victim > candidates if there are not-yet-reported OOM victim candidates from past > rounds of OOM killer invocations. I don't know if that matters. > > For now this patch embeds "struct oom_task_info" into each > "struct task_struct". In order to avoid bloating "struct task_struct", > future patch might detach from "struct task_struct" because one > "struct oom_task_info" for one "struct signal_struct" will be enough. > > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > --- > include/linux/sched.h | 17 +++++- > mm/oom_kill.c | 149 +++++++++++++++++++++++++++++++++++--------------- > 2 files changed, 121 insertions(+), 45 deletions(-) This is adding a lot of code for something that might be simply worked around by disabling dump_tasks. Unless there is a real world workload that suffers from the latency and depends on the eligible task list then I do not think this is mergeable. -- Michal Hocko SUSE Labs