Michal Hocko wrote: > From: Michal Hocko <mhocko@xxxxxxxx> > > When oom_reaper manages to unmap all the eligible vmas there shouldn't > be much of the freable memory held by the oom victim left anymore so it > makes sense to clear the TIF_MEMDIE flag for the victim and allow the > OOM killer to select another task. Just a confirmation. Is it safe to clear TIF_MEMDIE without reaching do_exit() with regard to freezing_slow_path()? Since clearing TIF_MEMDIE from the OOM reaper confuses wait_event(oom_victims_wait, !atomic_read(&oom_victims)); in oom_killer_disable(), I'm worrying that the freezing operation continues before the OOM victim which escaped the __refrigerator() actually releases memory. Does this cause consistency problem? > + /* > + * Clear TIF_MEMDIE because the task shouldn't be sitting on a > + * reasonably reclaimable memory anymore. OOM killer can continue > + * by selecting other victim if unmapping hasn't led to any > + * improvements. This also means that selecting this task doesn't > + * make any sense. > + */ > + tsk->signal->oom_score_adj = OOM_SCORE_ADJ_MIN; > + exit_oom_victim(tsk); I noticed that updating only one thread group's oom_score_adj disables further wake_oom_reaper() calls due to rough-grained can_oom_reap check at p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN in oom_kill_process(). I think we need to either update all thread groups' oom_score_adj using the reaped mm equally or use more fine-grained can_oom_reap check which ignores OOM_SCORE_ADJ_MIN if all threads in that thread group are dying or exiting. ---------- #define _GNU_SOURCE #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sched.h> static int writer(void *unused) { static char buffer[4096]; int fd = open("/tmp/file", O_WRONLY | O_CREAT | O_APPEND, 0600); while (write(fd, buffer, sizeof(buffer)) == sizeof(buffer)); return 0; } int main(int argc, char *argv[]) { unsigned long size; char *buf = NULL; unsigned long i; if (fork() == 0) { int fd = open("/proc/self/oom_score_adj", O_WRONLY); write(fd, "1000", 4); close(fd); for (i = 0; i < 2; i++) { char *stack = malloc(4096); if (stack) clone(writer, stack + 4096, CLONE_VM, NULL); } writer(NULL); while (1) pause(); } sleep(1); for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } sleep(5); /* Will cause OOM due to overcommit */ for (i = 0; i < size; i += 4096) buf[i] = 0; pause(); return 0; } ---------- ---------- [ 177.722853] a.out invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 [ 177.724956] a.out cpuset=/ mems_allowed=0 [ 177.725735] CPU: 3 PID: 3962 Comm: a.out Not tainted 4.5.0-rc2-next-20160204 #291 (...snipped...) [ 177.802889] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name (...snipped...) [ 177.872248] [ 3941] 1000 3941 28880 124 14 3 0 0 bash [ 177.874279] [ 3962] 1000 3962 541717 395780 784 6 0 0 a.out [ 177.876274] [ 3963] 1000 3963 1078 21 7 3 0 1000 a.out [ 177.878261] [ 3964] 1000 3964 1078 21 7 3 0 1000 a.out [ 177.880194] [ 3965] 1000 3965 1078 21 7 3 0 1000 a.out [ 177.882262] Out of memory: Kill process 3963 (a.out) score 998 or sacrifice child [ 177.884129] Killed process 3963 (a.out) total-vm:4312kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 177.887100] oom_reaper: reaped process :3963 (a.out) anon-rss:0kB, file-rss:0kB, shmem-rss:0lB [ 179.638399] crond invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0 [ 179.647708] crond cpuset=/ mems_allowed=0 [ 179.652996] CPU: 3 PID: 742 Comm: crond Not tainted 4.5.0-rc2-next-20160204 #291 (...snipped...) [ 179.771311] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name (...snipped...) [ 179.836221] [ 3941] 1000 3941 28880 124 14 3 0 0 bash [ 179.838278] [ 3962] 1000 3962 541717 396308 785 6 0 0 a.out [ 179.840328] [ 3963] 1000 3963 1078 0 7 3 0 -1000 a.out [ 179.842443] [ 3965] 1000 3965 1078 0 7 3 0 1000 a.out [ 179.844557] Out of memory: Kill process 3965 (a.out) score 998 or sacrifice child [ 179.846404] Killed process 3965 (a.out) total-vm:4312kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ---------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>