On Wed, 20 Nov 2013, Luigi Semenzato wrote: > Yes, I agree that we can't always prevent OOM situations, and in fact > we tolerate OOM kills, although they have a worse impact on the users > than controlled freeing does. > If the controlled freeing is able to actually free memory in time before hitting an oom condition, it should work pretty well. That ability is seems to be highly dependent on sane thresholds for indvidual applications and I'm afraid we can never positively ensure that we wakeup and are able to free memory in time to avoid the oom condition. > Well OK here it goes. I hate to be a party-pooper, but the notion of > a user-level OOM-handler scares me a bit for various reasons. > > 1. Our custom notifier sends low-memory warnings well ahead of memory > depletion. If we don't have enough time to free memory then, what can > the last-minute OOM handler do? > The userspace oom handler doesn't necessarily guarantee that you can do memory freeing, our usecase wants to do a priority-based oom killing that is different from the kernel oom killer based on rss. To do that, you only really need to read certain proc files and you can do killing based on uptime, for example. You can also do a hierarchical traversal of memcgs based on a priority. We already have hooks in the kernel oom killer, things like /proc/sys/vm/oom_kill_allocating_task and /proc/sys/vm/panic_on_oom that implement different policies that could now trivially be done in userspace with memory reserves and a timeout. The point is that we can't possibly encode every possible policy into the kernel and there's no reason why userspace can't do the kill itself. > 2. In addition to the time factor, it's not trivial to do anything, > including freeing memory, without allocating memory first, so we'll > need a reserve, but how much, and who is allowed to use it? > The reserve is configurable in the proposal as a memcg precharge and would be dependent on the memory needed by the userspace oom handler at wakeup. Only processes that are waiting on memory.oom_control have access to the memory reserve. > 3. How does one select the OOM-handler timeout? If the freeing paths > in the code are swapped out, the time needed to bring them in can be > highly variable. > The userspace oom handler itself is mlocked in memory, you'd want to select a timeout that is large enough to only react in situations where userspace is known to be unresponsive; it's only meant as a failsafe to avoid the memcg sitting around forever not making any forward progress. > 4. Why wouldn't the OOM-handler also do the killing itself? (Which is > essentially what we do.) Then all we need is a low-memory notifier > which can predict how quickly we'll run out of memory. > It can, but the prediction of how quickly we'll run out of memory is nearly impossible for every class of application and the timeout is required before the kernel steps in to solve the situation. > 5. The use case mentioned earlier (the fact that the killing of one > process can make an entire group of processes useless) can be dealt > with using OOM priorities and user-level code. > It depends on the application being killed. > I confess I am surprised that the OOM killer works as well as I think > it does. Adding a user-level component would bring a whole new level > of complexity to code that's already hard to fully comprehend, and > might not really address the fundamental issues. > The kernel code that would be added certainly isn't complex and I believe it is better than the current functionality that only allows you to disable the memcg oom killer entirely to effect any userspace policy. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>