On Wed 08-06-16 16:52:04, Vladimir Davydov wrote: > On Wed, Jun 08, 2016 at 10:33:34AM +0200, Michal Hocko wrote: > > On Fri 27-05-16 17:17:42, Vladimir Davydov wrote: > > [...] > > > @@ -970,26 +1028,25 @@ bool out_of_memory(struct oom_control *oc) > > > !oom_unkillable_task(current, NULL, oc->nodemask) && > > > current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { > > > get_task_struct(current); > > > - oom_kill_process(oc, current, 0, totalpages, > > > - "Out of memory (oom_kill_allocating_task)"); > > > + oom_kill_process(oc, current, 0, totalpages); > > > return true; > > > } > > > > Do we really want to introduce sysctl_oom_kill_allocating_task to memcg > > as well? > > Not sure, but why not? We take into account dump_tasks and panic_on_oom > on memcg oom so why should we treat this sysctl differently? Well, for one thing nobody has requested that and it would be a user visible change which might be unexpected. And as already said I think it was a mistake to introduce this sysctl in the first place. The behavior is so random that I am even not sure it is usable in the real life. Spreading it more doesn't sound like a good idea to me. [...] > > Now if you look at out_of_memory() the only shared "heuristic" with the > > memcg part is the bypass for the exiting tasks. > > bypass exiting task (task_will_free_mem) > check for panic (check_panic_on_oom) > oom badness evaluation (oom_scan_process_thread or oom_evaluate_task > after your patch) > points calculation + kill (oom_kill_process) > > And if you need to modify any of these function calls or add yet another > check, you have to do it twice. Ugly. Ideally all those changes would happen inside those helpers. Also if you look at out_of_memory and mem_cgroup_out_of_memory it is much easier to follow the later one because it doesn't have that different combinations of heuristic which only make sense for sysrq or global oom. > > Plus both need the oom_lock. > > I believe locking could be unified for global/memcg oom cases too. > > > You have to special case oom notifiers, panic on no victim handling and > > I guess the oom_kill_allocating task is not intentional either. So I > > am not really sure this is an improvement. I even hate how we conflate > > sysrq vs. regular global oom context together but my cleanup for that > > has failed in the past. > > > > The victim selection code can be reduced because it is basically > > shared between the two, only the iterator differs. But I guess that > > can be eliminated by a simple helper. > > IMHO exporting a bunch of very oom-specific helpers (like those I > enumerated above), partially revealing oom implementation, instead of > well defined memcg helpers that could be reused anywhere else looks > ugly. It's like having shrink_zone implementation both in vmscan.c and > memcontrol.c with shrink_slab, shrink_lruvec, etc. exported, because we > need to iterate over cgroups there. I agree that the API for OOM killer parts is not really great. I am just little bit afraid that iterators are just over engineered. I am even not sure whethers those have any other potential users. The diffstat of the cleanup I have here right now sounds really encouranging. --- include/linux/oom.h | 17 ++++------- mm/memcontrol.c | 48 +++-------------------------- mm/oom_kill.c | 87 ++++++++++++++++++++++++++++++----------------------- 3 files changed, 60 insertions(+), 92 deletions(-) compared to yours include/linux/memcontrol.h | 15 ++++ include/linux/oom.h | 51 ------------- mm/memcontrol.c | 112 ++++++++++----------------- mm/oom_kill.c | 183 +++++++++++++++++++++++++++++---------------- 4 files changed, 176 insertions(+), 185 deletions(-) we save more LOC with a smaller patch. I know this is not an absolute metric but I would rather go with simplicity than an elaborate APIs. This is all pretty much mm/memcg internal. Anyway I do not have strong opinion and will not insist. I can post the full cleanup with suggestions from Tetsuo integrated if you are interested. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>