Michal Hocko wrote: > I mean we should eventually fail all the allocation types but GFP_NOFS > is coming from _carefully_ handled code paths which is an easier starting > point than a random code path in the kernel/drivers. So can we finally > move at least in this direction? I agree that all the allocation types can fail unless GFP_NOFAIL is given. But I also expect that all the allocation types should not fail unless order > PAGE_ALLOC_COSTLY_ORDER or GFP_NORETRY is given or chosen as an OOM victim. We already experienced at Linux 3.19 what happens if !__GFP_FS allocations fails. out_of_memory() is called by pagefault_out_of_memory() when 0x2015a (!__GFP_FS) allocation failed. This looks to me that !__GFP_FS allocations are effectively OOM killer context. It is not fair to kill the thread which triggered a page fault, for that thread may not be using so much memory (unfair from memory usage point of view) or that thread may be global init (unfair because killing the entire system than survive by killing somebody). Also, failing the GFP_NOFS/GFP_NOIO allocations which are not triggered by a page fault generally causes more damage (e.g. taking filesystem error action) than survive by killing somebody. Therefore, I think we should not hesitate invoking the OOM killer for !__GFP_FS allocation. > > Likewise, there is possibility that such memory reserve is used by threads > > which the OOM victim is not waiting for, for malloc() + memset() causes > > __GFP_FS allocations. > > We cannot be certain without complete dependency tracking. This is > just a heuristic. Yes, we cannot be certain without complete dependency tracking. And doing complete dependency tracking is too expensive to implement. Dave is recommending that we should focus on not to trigger the OOM killer than how to handle corner cases in OOM conditions, isn't he? I still believe that choosing more OOM victims upon timeout (which is a heuristic after all) and invoking the OOM killer for !__GFP_FS allocations are the cheapest and least surprising. This is something like automatically and periodically pressing SysRq-f on behalf of the system administrator when memory allocator cannot recover from low memory situation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>