On Wed, Apr 23, 2014 at 12:12:02PM +0200, Michal Hocko wrote: > Eric has reported that he can see task(s) stuck in memcg OOM handler > regularly. The only way out is to > > echo 0 > $GROUP/memory.oom_controll > > His usecase is: > > - Setup a hierarchy with memory and the freezer (disable kernel oom and > have a process watch for oom). > > - In that memory cgroup add a process with one thread per cpu. > > - In one thread slowly allocate once per second I think it is 16M of ram > and mlock and dirty it (just to force the pages into ram and stay > there). > > - When oom is achieved loop: > * attempt to freeze all of the tasks. > * if frozen send every task SIGKILL, unfreeze, remove the directory in > cgroupfs. > > Eric has then pinpointed the issue to be memcg specific. > > All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled. > Those that have received fatal signal will bypass the charge and should > continue on their way out. The tricky part is that the exit path might > trigger a page fault (e.g. exit_robust_list), thus the memcg charge, > while its memcg is still under OOM because nobody has released any charges > yet. > > Unlike with the in-kernel OOM handler the exiting task doesn't get > TIF_MEMDIE set so it doesn't shortcut further charges of the killed task > and falls to the memcg OOM again without any way out of it as there are no > fatal signals pending anymore. > > This patch fixes the issue by checking PF_EXITING early in > mem_cgroup_try_charge and bypass the charge same as if it had fatal > signal pending or TIF_MEMDIE set. > > Normally exiting tasks (aka not killed) will bypass the charge now but > this should be OK as the task is leaving and will release memory and > increasing the memory pressure just to release it in a moment seems > dubious wasting of cycles. Besides that charges after exit_signals should > be rare. > > Reported-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > Signed-off-by: Michal Hocko <mhocko@xxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> We're allowing fatal_signal_pending() tasks to bypass the limit already, so I don't see why we shouldn't do the same for tasks that cleared the signal and are in fact exiting. Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>