On Mon, Nov 18, 2013 at 05:51:10PM +0100, Michal Hocko wrote: > On Mon 18-11-13 10:41:15, Johannes Weiner wrote: > > On Thu, Nov 14, 2013 at 03:26:51PM -0800, David Rientjes wrote: > > > When current has a pending SIGKILL or is already in the exit path, it > > > only needs access to memory reserves to fully exit. In that sense, the > > > memcg is not actually oom for current, it simply needs to bypass memory > > > charges to exit and free its memory, which is guarantee itself that > > > memory will be freed. > > > > > > We only want to notify userspace for actionable oom conditions where > > > something needs to be done (and all oom handling can already be deferred > > > to userspace through this method by disabling the memcg oom killer with > > > memory.oom_control), not simply when a memcg has reached its limit, which > > > would actually have to happen before memcg reclaim actually frees memory > > > for charges. > > > > Even though the situation may not require a kill, the user still wants > > to know that the memory hard limit was breached and the isolation > > broken in order to prevent a kill. We just came really close and the > > You can observe that you are getting into troubles from fail counter > already. The usability without more reclaim statistics is a bit > questionable but you get a rough impression that something is wrong at > least. > > > fact that current is exiting is coincidental. Not everybody is having > > OOM situations on a frequent basis and they might want to know when > > they are redlining the system and that the same workload might blow up > > the next time it's run. > > I am just concerned that signaling temporal OOM conditions which do not > require any OOM killer action (user or kernel space) might be confusing. > Userspace would have harder times to tell whether any action is required > or not. But userspace in all likeliness DOES need to take action. Reclaim is a really long process. If 5 times doing 12 priority cycles and scanning thousands of pages is not enough to reclaim a single page, what does that say about the health of the memcg? But more importantly, OOM handling is just inherently racy. A task might receive the kill signal a split second *after* userspace was notified. Or a task may exit voluntarily a split second after a victim was chosen and killed. We have to draw a line somewhere, right now this is "reclaim failed". This patch doesn't fix a problem, it just blurs that line and makes OOM notifications less predictable. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>