On Tue, Jul 14, 2020 at 3:05 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Mon 13-07-20 21:11:50, Yafang Shao wrote: > > On Mon, Jul 13, 2020 at 8:45 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > On Mon 13-07-20 20:24:07, Yafang Shao wrote: > [...] > > > > But we can't try locking the global oom_lock here, because task ooming > > > > in memcg foo may can't help the tasks in memcg bar. > > > > > > I do not follow. oom_lock is not about fwd progress. It is a big lock to > > > synchronize against oom_disable logic. > > > > > > I have this in mind > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 248e6cad0095..29d1f8c2d968 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1563,8 +1563,10 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > > > }; > > > bool ret; > > > > > > - if (mutex_lock_killable(&oom_lock)) > > > + if (!mutex_trylock(&oom_lock)) > > > return true; > > > > root_mem_cgroup > > / \ > > memcg_a (16G) memcg_b (32G) > > | | > > process a_1 (reach memcg_a limit) process b_1(reach > > memcg_b limit) > > hold oom_lock wait oom_lock > > > > So we can find that process a_1 will try to kill process in memcg_a, > > while process b_1 need to try to kill process in memcg_b. > > IOW, the process killed in memcg_a can't help the processes in > > memcg_b, so if process b_1 should not trylock oom_lock here. > > > > While if the memcg tree is , > > target mem_cgroup (16G) > > / \ > > | > > | > > process a_1 (reach memcg_a limit) process a_2(reach > > memcg_a limit) > > hold oom_lock wait oom_lock > > > > Then, process a_2 can trylock oom_lock here. IOW, these processes > > should in the same memcg. > > > > That's why I said that we should introduce per-memcg oom_lock. > > I still fail to understand your reaasoning. Sure, the oom lock is global > so it doesn't have a per oom hierarchy resolution pretty much by definition. > But that is not really important. The whole point of the trylock is to > remove the ordering between the oom selection, the oom reaper and > potential charge consumers which trigger the oom in parallel. With the > blocking lock they would pile up in the order they have hit the OOM > situation. With the trylock they would simply keep retrying until the > oom is done. That would reduce the race window considerably. This is > what the global oom is doing. > Thanks for the explanation. Seems trylock can work. > Another alternative would be to check mem_cgroup_margin after the lock > is taken but it would be better to keep in sync with the global case as > much as possible unless there is a good reason to differ. > > -- > Michal Hocko > SUSE Labs -- Thanks Yafang