Michal Hocko wrote: > On Thu 17-11-16 21:50:04, Tetsuo Handa wrote: > > Filesystem code might request costly __GFP_NOFAIL !__GFP_REPEAT GFP_NOFS > > allocations. But commit 0a0337e0d1d13446 ("mm, oom: rework oom detection") > > overlooked that __GFP_NOFAIL allocation requests need to invoke the OOM > > killer and retry even if order > PAGE_ALLOC_COSTLY_ORDER && !__GFP_REPEAT. > > The caller will crash if such allocation request failed. > > Could you point to such an allocation request please? Costly GFP_NOFAIL > requests are a really high requirement and I am even not sure we should > support them. buffered_rmqueue already warns about order > 1 NOFAIL > allocations. That question is pointless. You are simply lucky that you see only order 0 or order 1. There are many __GFP_NOFAIL allocations where order is determined at runtime. There is no guarantee that order 2 and above never happens. http://lkml.kernel.org/r/56F8F5DA.6040206@xxxxxxxx is a case where an XFS user had trouble due to order 4 and order 5 allocations because XFS uses opencoded endless loop around allocator than __GFP_NOFAIL. There is WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); in buffered_rmqueue(), but there is no guarantee that the caller hits this warning. It is theoretically possible that order > PAGE_ALLOC_COSTLY_ORDER && __GFP_NOFAIL && !__GFP_REPEAT allocation requests fail to call buffered_rmqueue() from get_page_from_freelist() due to "continue;" statements around zone watermark checks. It is possible that an order == 2 && __GFP_NOFAIL allocation reuest hit this WARN_ON_ONCE() and subsequent order > PAGE_ALLOC_COSTLY_ORDER && __GFP_NOFAIL && !__GFP_REPEAT allocation requests no longer hits this WARN_ON_ONCE(). So, you want to mess up the definition of __GFP_NOFAIL like below? I think my patch is better than below change. --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -138,9 +138,16 @@ * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller * cannot handle allocation failures. New users should be evaluated carefully * (and the flag should be used only when there is no reasonable failure * policy) but it is definitely preferable to use the flag rather than - * opencode endless loop around allocator. + * opencode endless loop around allocator. However, the VM implementation + * is allowed to fail if order > PAGE_ALLOC_COSTLY_ORDER but __GFP_REPEAT + * is not set. That is, all users which specify __GFP_NOFAIL with order + * determined at runtime (e.g. sizeof(struct) * num_elements) had better + * specify __GFP_REPEAT as well, for the VM implementation does not invoke + * the OOM killer unless __GFP_REPEAT is also specified and the caller has + * to be prepared for allocation failures using opencode endless loop + * around allocator. * * __GFP_NORETRY: The VM implementation must not retry indefinitely and will * return NULL when direct reclaim and memory compaction have failed to allow * the allocation to succeed. The OOM killer is not called with the current -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>