On 2023/3/14 20:00, Michal Hocko wrote: > On Tue 14-03-23 19:07:27, Haifeng Xu wrote: >> >> >> On 2023/3/14 18:16, Michal Hocko wrote: >>> On Tue 14-03-23 18:07:42, Haifeng Xu wrote: >>>> >>>> >>>> On 2023/3/14 17:19, Michal Hocko wrote: >>>>> On Tue 14-03-23 09:11:36, Haifeng Xu wrote: >>>>>> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() >>>>>> to kill all processes in the memcg. When scanning tasks in memcg, maybe >>>>>> the provided task is marked as oom victim. Also, some tasks are likely >>>>>> to release their address space. There is no need to kill the exiting tasks. >>>>> >>>>> This doesn't state any actual problem. Could you be more specific? Is >>>>> this a bug fix, a behavior change or an optimization? >>>> >>>> >>>> 1) oom_kill_process() has inovked __oom_kill_process() to kill the selected victim, but it will be scanned >>>> in mem_cgroup_scan_tasks(). It's pointless to kill the victim twice. >>> >>> Why does that matter though? The purpose of task_will_free_mem in >>> oom_kill_process is different. It would bail out from a potentially >>> noisy OOM report when the selected oom victim is expected to terminate >>> soon. __oom_kill_process called for the whole memcg doesn't aim at >>> avoiding any oom victims. It merely sends a kill signal too all of them. >>> >> >> except sending kill signals, __oom_kill_process() will do some other work, such as print messeages, traversal all >> all user processes sharing mm which holds RCU section and so on. So if skip the victim, we don't need those work again >> and it won't affect the original mechanism. All oom victims are still get killed. > > mm sharing among processes is a very rare thing but do not forget that > task_will_free_mem needs to do the same thing for the same reason. For the victim, __oom_kill_process() traversals all processes in the system whether there some other tasks sharing mm or not. If skip it, this work can be dropped. > >>>> 2) for those exiting processes, reaping them directly is also a faster way to free memory compare with invoking >>>> __oom_kill_process(). >>> >>> Is it? What if the terminating task is blocked on lock? Async oom >>> reaping might release those resources in that case. >> >> Yes, the reaping process is asynchronous. I mean we don't need the work mentioned above any more. >> "reaping them directly" here is that joining the task in oom reaper queue. > > I do not follow. > > In any case I still do not see any actual justification for the change > other than "we can do it and it might turn out less expensive". This > alone is not sufficient, just be explicit, because oom is hardly a fast > path to optimize every single cpu cycle for. So unless you see an actual > real life problem that would be behaving much better or even fixed then > I am not convinced this is a worthwhile change to have. > we can also see two same messages("Memory cgroup out of memory: Killed process ***")about the victim. This seems a little confusing. If skip the victim, only one message was printed.