On Fri, 2019-09-06 at 19:32 +0900, Tetsuo Handa wrote: > On 2019/09/06 6:21, Qian Cai wrote: > > On Fri, 2019-09-06 at 05:59 +0900, Tetsuo Handa wrote: > > > On 2019/09/06 1:10, Qian Cai wrote: > > > > On Tue, 2019-09-03 at 17:13 +0200, Michal Hocko wrote: > > > > > On Tue 03-09-19 11:02:46, Qian Cai wrote: > > > > > > Well, I still see OOM sometimes kills wrong processes like ssh, systemd > > > > > > processes while LTP OOM tests with staight-forward allocation patterns. > > > > > > > > > > Please report those. Most cases I have seen so far just turned out to > > > > > work as expected and memory hogs just used oom_score_adj or similar. > > > > > > > > Here is the one where oom01 should be one to be killed. > > > > > > I assume that there are previous OOM killer events before > > > > > > > > > > > [92598.855697][ T2588] Swap cache stats: add 105240923, delete 105250445, find > > > > 42196/101577 > > > > > > line. Please be sure to include. > > > > 12:00:52 oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice,task=oom01,pid=25507,uid=0 > > 12:00:52 Out of memory: Killed process 25507(oom01) total-vm:6324780kB, anon-rss:5647168kB, file-rss:0kB, shmem-rss:0kB,UID:0 pgtables:11395072kB oom_score_adj:0 > > 12:00:52 oom_reaper: reaped process 25507(oom01), now anon-rss:5647452kB, file-rss:0kB, shmem-rss:0kB > > 12:00:52 irqbalance invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > > (...snipped...) > > 12:00:53 [ 25391] 0 25391 2184 0 65536 32 0 oom01 > > 12:00:53 [ 25392] 0 25392 2184 0 65536 39 0 oom01 > > 12:00:53 oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/system.slice/tuned.service,task=tuned,pid=2629,uid=0 > > 12:00:54 Out of memory: Killed process 2629(tuned) total-vm:424936kB, anon-rss:328kB, file-rss:1268kB, shmem-rss:0kB, UID:0 pgtables:442368kB oom_score_adj:0 > > 12:00:54 oom_reaper: reaped process 2629 (tuned), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > OK. anon-rss did not decrease when oom_reaper gave up. > I think this is same with https://lkml.org/lkml/2017/7/28/317 case. > > The OOM killer does not wait for OOM victims until existing OOM victims release > memory by calling exit_mmap(). The OOM killer selects next OOM victim as soon as > the OOM reaper sets MMF_OOM_SKIP. As a result, when the OOM reaper failed to > reclaim memory due to e.g. mlocked pages, the OOM killer immediately selects next > OOM victim. But since 25391 and 25392 are consuming little memory (maybe these are > already reaped OOM victims), neither 25391 nor 25392 was selected as next OOM victim. > Yes, mlocked is troublesome. I have other incidents where crond and systemd- udevd were killed by mistake, and it even tried to kill kworker/0. https://cailca.github.io/files/dmesg.txt