On Wed, 13 Nov 2013, Johannes Weiner wrote: > > The reclaim will fail, the only reason current has TIF_MEMDIE set is > > because reclaim has completely failed. > > ...for somebody else. > That process is in the same allocating context as current, otherwise current would not have been selected as a victim. The oom killer tries to only kill processes that will lead to future memory freeing where reclaim has failed. > > I don't know of any other "random places" other than when the oom killed > > process is sitting in reclaim before it is selected as the victim. Once > > it returns to the page allocator, it will immediately allocate and then be > > able to handle its pending SIGKILL. The one spot identified where it is > > absolutely pointless to spin is in reclaim since it is virtually > > guaranteed to fail. This patch fixes that issue. > > No, this applies to every other operation that does not immediately > lead to the task exiting or which creates more system load. Readahead > would be another example. They're all pointless and you could do > without all of them at this point, but I'm not okay with putting these > checks in random places that happen to bother you right now. It's not > a proper solution to the problem. > If you have an alternative solution, please feel free to propose it and I'll try it out. This isn't only about the cond_resched() in shrink_slab(), the reclaim is going to fail. There should be no instances where an oom killed process can go and start magically reclaiming memory that would have prevented it from becoming oom in the first place. I have seen the oom killer trigger and the victim stall for several seconds before actually allocating memory and that stall is pointless, especially when we're not touching a hotpath here, we're in direct reclaim already. > Is it a good idea to let ~700 processes simultaneously go into direct > global reclaim? > > The victim aborting reclaim still leaves you with ~699 processes > spinning in reclaim that should instead just retry the allocation as > well. What about them? > Um, no, those processes are going through a repeated loop of direct reclaim, calling the oom killer, iterating the tasklist, finding an existing oom killed process that has yet to exit, and looping. They wouldn't loop for too long if we can reduce the amount of time that it takes for that oom killed process to exit. > The situation your setups seem to get in frequently is bananas, don't > micro optimize this. > Unless you propose an alternative solution, this is the patch that fixes the problem when an oom killed process gets killed and then stalls for seconds before it actually retries allocating memory. Thanks for your thoughts. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>