On Wed, 23 Apr 2014, Michal Hocko wrote: > Eric has reported that he can see task(s) stuck in memcg OOM handler > regularly. The only way out is to > > echo 0 > $GROUP/memory.oom_controll > > His usecase is: > > - Setup a hierarchy with memory and the freezer (disable kernel oom and > have a process watch for oom). > > - In that memory cgroup add a process with one thread per cpu. > > - In one thread slowly allocate once per second I think it is 16M of ram > and mlock and dirty it (just to force the pages into ram and stay > there). > > - When oom is achieved loop: > * attempt to freeze all of the tasks. > * if frozen send every task SIGKILL, unfreeze, remove the directory in > cgroupfs. > > Eric has then pinpointed the issue to be memcg specific. > > All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled. > Those that have received fatal signal will bypass the charge and should > continue on their way out. The tricky part is that the exit path might > trigger a page fault (e.g. exit_robust_list), thus the memcg charge, > while its memcg is still under OOM because nobody has released any charges > yet. > > Unlike with the in-kernel OOM handler the exiting task doesn't get > TIF_MEMDIE set so it doesn't shortcut further charges of the killed task > and falls to the memcg OOM again without any way out of it as there are no > fatal signals pending anymore. > > This patch fixes the issue by checking PF_EXITING early in > mem_cgroup_try_charge and bypass the charge same as if it had fatal > signal pending or TIF_MEMDIE set. > > Normally exiting tasks (aka not killed) will bypass the charge now but > this should be OK as the task is leaving and will release memory and > increasing the memory pressure just to release it in a moment seems > dubious wasting of cycles. Besides that charges after exit_signals should > be rare. > > Reported-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > Signed-off-by: Michal Hocko <mhocko@xxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Acked-by: David Rientjes <rientjes@xxxxxxxxxx> I think we should wait for a Tested-by from Eric if this is going to be backported to stable, though, to meet the criteria. -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html