On Thu, Nov 14, 2013 at 03:26:51PM -0800, David Rientjes wrote: > When current has a pending SIGKILL or is already in the exit path, it > only needs access to memory reserves to fully exit. In that sense, the > memcg is not actually oom for current, it simply needs to bypass memory > charges to exit and free its memory, which is guarantee itself that > memory will be freed. > > We only want to notify userspace for actionable oom conditions where > something needs to be done (and all oom handling can already be deferred > to userspace through this method by disabling the memcg oom killer with > memory.oom_control), not simply when a memcg has reached its limit, which > would actually have to happen before memcg reclaim actually frees memory > for charges. Even though the situation may not require a kill, the user still wants to know that the memory hard limit was breached and the isolation broken in order to prevent a kill. We just came really close and the fact that current is exiting is coincidental. Not everybody is having OOM situations on a frequent basis and they might want to know when they are redlining the system and that the same workload might blow up the next time it's run. The emergency reserves are there to prevent the system from deadlocking. We only dip into them to avert a more imminent disaster but we are no longer in good shape at this point. But by not even announcing this situation to userspace anymore you are making this the new baseline and declaring that everything is fine when the system is already clutching at straws. I maintain that we should signal OOM when our healthy and always-available options are exhausted. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html