On Tue, 17 Dec 2013, Michal Hocko wrote: > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index c72b03bf9679..fee25c5934d2 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -2692,7 +2693,8 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm, > > > * MEMDIE process. > > > */ > > > if (unlikely(test_thread_flag(TIF_MEMDIE) > > > - || fatal_signal_pending(current))) > > > + || fatal_signal_pending(current)) > > > + || current->flags & PF_EXITING) > > > goto bypass; > > > > > > if (unlikely(task_in_memcg_oom(current))) > > > > > > rather than the later checks down the oom_synchronize paths. The comment > > > already mentions dying process... > > > > > > > This is scary because it doesn't even try to reclaim memcg memory before > > allowing the allocation to succeed. > > Why should it reclaim in the first place when it simply is on the way to > release memory. In other words why should it increase the memory > pressure when it is in fact releasing it? > (Answering about removing the fatal_signal_pending() check as well here.) For memory isolation, we'd only want to bypass memcg charges when absolutely necessary and it seems like TIF_MEMDIE is the only case where that's required. We don't give processes with pending SIGKILLs or those in the exit() path access to memory reserves in the page allocator without first determining that reclaim can't make any progress for the same reason and then we only do so by setting TIF_MEMDIE when calling the oom killer. > I am really puzzled here. On one hand you are strongly arguing for not > notifying when we know we can prevent from OOM action and on the other > hand you are ok to get vmpressure/thresholds notification when an > exiting task triggers reclaim. > > So I am really lost in what you are trying to achieve here. It sounds a > bit arbirtrary. > It's not arbitrary to define when memcg bypass is allowed and, in my opinion, it should only be done in situations where it is unavoidable and therefore breaking memory isolation is required. (We wouldn't expect a 128MB memcg to be oom [and perhaps with a userspace oom handler attached] when it has 100 children each 1MB in size just because they all happen to be oom at the same time. We set up the excess memory in the parent specifically for the memcg with the oom handler attached.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>