On Wed, Jun 08, 2016 at 10:33:34AM +0200, Michal Hocko wrote: > On Fri 27-05-16 17:17:42, Vladimir Davydov wrote: > [...] > > @@ -970,26 +1028,25 @@ bool out_of_memory(struct oom_control *oc) > > !oom_unkillable_task(current, NULL, oc->nodemask) && > > current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { > > get_task_struct(current); > > - oom_kill_process(oc, current, 0, totalpages, > > - "Out of memory (oom_kill_allocating_task)"); > > + oom_kill_process(oc, current, 0, totalpages); > > return true; > > } > > Do we really want to introduce sysctl_oom_kill_allocating_task to memcg > as well? Not sure, but why not? We take into account dump_tasks and panic_on_oom on memcg oom so why should we treat this sysctl differently? > The heuristic is quite dubious even for the global context IMHO > because it leads to a very random behavior. > > > p = select_bad_process(oc, &points, totalpages); > > /* Found nothing?!?! Either we hang forever, or we panic. */ > > - if (!p && !is_sysrq_oom(oc)) { > > + if (!p && !is_sysrq_oom(oc) && !oc->memcg) { > > dump_header(oc, NULL); > > panic("Out of memory and no killable processes...\n"); > > } > > if (p && p != (void *)-1UL) { > > - oom_kill_process(oc, p, points, totalpages, "Out of memory"); > > + oom_kill_process(oc, p, points, totalpages); > > /* > > * Give the killed process a good chance to exit before trying > > * to allocate memory again. > > */ > > schedule_timeout_killable(1); > > } > > - return true; > > + return !!p; > > } > > Now if you look at out_of_memory() the only shared "heuristic" with the > memcg part is the bypass for the exiting tasks. bypass exiting task (task_will_free_mem) check for panic (check_panic_on_oom) oom badness evaluation (oom_scan_process_thread or oom_evaluate_task after your patch) points calculation + kill (oom_kill_process) And if you need to modify any of these function calls or add yet another check, you have to do it twice. Ugly. > Plus both need the oom_lock. I believe locking could be unified for global/memcg oom cases too. > You have to special case oom notifiers, panic on no victim handling and > I guess the oom_kill_allocating task is not intentional either. So I > am not really sure this is an improvement. I even hate how we conflate > sysrq vs. regular global oom context together but my cleanup for that > has failed in the past. > > The victim selection code can be reduced because it is basically > shared between the two, only the iterator differs. But I guess that > can be eliminated by a simple helper. IMHO exporting a bunch of very oom-specific helpers (like those I enumerated above), partially revealing oom implementation, instead of well defined memcg helpers that could be reused anywhere else looks ugly. It's like having shrink_zone implementation both in vmscan.c and memcontrol.c with shrink_slab, shrink_lruvec, etc. exported, because we need to iterate over cgroups there. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>