On Mon, 7 Mar 2011 17:02:36 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote: > On Mon, 7 Mar 2011, Andrew Morton wrote: > > > > > > So the question I'd ask is > > > > > > > > What about my question? Why is your usersapce oom-handler "unresponsive"? > > > > > > > > > > If we have a per-memcg userspace oom handler, then it's absolutely > > > required that it either increase the hard limit of the oom memcg or kill a > > > task to free memory; anything else risks livelocking that memcg. At > > > the same time, the oom handler's memcg isn't really important: it may be > > > in a different memcg but it may be oom at the same time. If we risk > > > livelocking the memcg when it is oom and the oom killer cannot respond > > > (the only reason for the oom killer to exist in the first place), then > > > there's no guarantee that a userspace oom handler could respond under > > > livelock. > > > > So you're saying that your userspace oom-handler is in a memcg which is > > also oom? > > It could be, if users assign the handler to a different memcg; otherwise, > it's guaranteed. Putting the handler into the same container would be rather daft. If userspace is going to elect to take over a kernel function then it should be able to perform that function reliably. We don't have hacks in the kernel to stop runaway SCHED_FIFO tasks, either. If the oom handler has put itself into a memcg and then has permitted that memcg to go oom then userspace is busted. > Keep in mind that for oom situations we give the killed > task access to memory reserves below the min watermark with TIF_MEMDIE so > that they can allocate memory to exit as quickly as possible (either to > handle the SIGKILL or within the exit path). That's because we can't > guarantee anything within an oom system, cpuset, mempolicy, or memcg is > ever responsive without it. (And, the side effect of it and its threads > exiting is the freeing of memory which allows everything else to once > again be responsive.) > > > That this is the only situation you've observed in which the > > userspace oom-handler is "unresponsive"? > > > > Personally, yes, but I could imagine other users could get caught if their > userspace oom handler requires taking locks (such as mmap_sem) by reading > within procfs that a thread within an oom memcg already holds. If activity in one memcg cause a lockup of processes in a separate memcg then that's a containment violation and we should fix it. One could argue that peering into a separate memcg's procfs files was already a containment violation, but from a practical point of view we definitely do want processes in a separate memcg to be able to passively observe activity in another without stepping on lindmines. My issue with this patch is that it extends the userspace API. This means we're committed to maintaining that interface *and its behaviour* for evermore. But the oom-killer and memcg are both areas of intense development and the former has a habit of getting ripped out and rewritten. Committing ourselves to maintaining an extension to the userspace interface is a big thing, especially as that extension is somewhat tied to internal implementation details and is most definitely tied to short-term inadequacies in userspace and in the kernel implementation. We should not commit the kernel to maintaining this new interface for all time until all alternatives have been eliminated. The patch looks to me like a short-term hack to work around medium-term userspace and kernel inadequacies, and that's a really bad basis upon which to merge it. Expedient hacks do sometimes makes sense, but it's real bad when they appear in the API. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>