On Wed 11-01-17 20:32:12, David Rientjes wrote: > When memory.move_charge_at_immigrate is enabled and precharges are > depleted during move, mem_cgroup_move_charge_pte_range() will attempt to > increase the size of the precharge. > > This livelocks if reclaim fails and if an oom killed process attached to > the destination memcg is trying to exit, which requires > cgroup_threadgroup_rwsem, since we're holding the mutex (we also livelock > while holding mm->mmap_sem for read). Is this really the case? try_charge will return with ENOMEM for GFP_KERNEL requests and mem_cgroup_do_precharge will bail out. So how exactly do we livelock? We do not depend on the exiting task to make a forward progress. Or am I missing something? > Prevent precharges from ever looping by setting __GFP_NORETRY. This was > probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is > pointless as written. Yes the current code is clearly bogus, I really do not remember why we ended up with this rather than GFP_KERNEL | __GFP_NORETRY. > This also restructures mem_cgroup_wait_acct_move() since it is not > possible for mc.moving_task to be current. Please separate this out to its own patch. > Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path") > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> For the mem_cgroup_do_precharge part Acked-by: Michal Hocko <mhocko@xxxxxxxx> > --- > mm/memcontrol.c | 32 +++++++++++++++++++------------- > 1 file changed, 19 insertions(+), 13 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1125,18 +1125,19 @@ static bool mem_cgroup_under_move(struct mem_cgroup *memcg) > > static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > { > - if (mc.moving_task && current != mc.moving_task) { > - if (mem_cgroup_under_move(memcg)) { > - DEFINE_WAIT(wait); > - prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); > - /* moving charge context might have finished. */ > - if (mc.moving_task) > - schedule(); > - finish_wait(&mc.waitq, &wait); > - return true; > - } > + DEFINE_WAIT(wait); > + > + if (likely(!mem_cgroup_under_move(memcg))) > + return false; > + > + prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); > + /* moving charge context might have finished. */ > + if (mc.moving_task) { > + WARN_ON_ONCE(mc.moving_task == current); > + schedule(); > } > - return false; > + finish_wait(&mc.waitq, &wait); > + return true; > } > > #define K(x) ((x) << (PAGE_SHIFT-10)) > @@ -4355,9 +4356,14 @@ static int mem_cgroup_do_precharge(unsigned long count) > return ret; > } > > - /* Try charges one by one with reclaim */ > + /* > + * Try charges one by one with reclaim, but do not retry. This avoids > + * looping forever when try_charge() cannot reclaim memory and the oom > + * killer defers while waiting for a process to exit which is trying to > + * acquire cgroup_threadgroup_rwsem in the exit path. > + */ > while (count--) { > - ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_NORETRY, 1); > + ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); > if (ret) > return ret; > mc.precharge++; -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html