Re: [PATCH] mm/page_alloc: Don't fail costly __GFP_NOFAIL allocations.

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Mon, 21 Nov 2016 20:16:40 +0900

Michal Hocko wrote:
> On Thu 17-11-16 21:50:04, Tetsuo Handa wrote:
> > Filesystem code might request costly __GFP_NOFAIL !__GFP_REPEAT GFP_NOFS
> > allocations. But commit 0a0337e0d1d13446 ("mm, oom: rework oom detection")
> > overlooked that __GFP_NOFAIL allocation requests need to invoke the OOM
> > killer and retry even if order > PAGE_ALLOC_COSTLY_ORDER && !__GFP_REPEAT.
> > The caller will crash if such allocation request failed.
>
> Could you point to such an allocation request please? Costly GFP_NOFAIL
> requests are a really high requirement and I am even not sure we should
> support them. buffered_rmqueue already warns about order > 1 NOFAIL
> allocations.

That question is pointless. You are simply lucky that you see only order 0 or
order 1. There are many __GFP_NOFAIL allocations where order is determined at
runtime. There is no guarantee that order 2 and above never happens.
http://lkml.kernel.org/r/56F8F5DA.6040206@xxxxxxxx is a case where an XFS user
had trouble due to order 4 and order 5 allocations because XFS uses opencoded
endless loop around allocator than __GFP_NOFAIL.

There is WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); in
buffered_rmqueue(), but there is no guarantee that the caller hits this
warning. It is theoretically possible that order > PAGE_ALLOC_COSTLY_ORDER &&
__GFP_NOFAIL && !__GFP_REPEAT allocation requests fail to call buffered_rmqueue()
 from get_page_from_freelist() due to "continue;" statements around zone
watermark checks. It is possible that an order == 2 && __GFP_NOFAIL allocation
reuest hit this WARN_ON_ONCE() and subsequent order > PAGE_ALLOC_COSTLY_ORDER &&
__GFP_NOFAIL && !__GFP_REPEAT allocation requests no longer hits this WARN_ON_ONCE().

So, you want to mess up the definition of __GFP_NOFAIL like below?
I think my patch is better than below change.

--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -138,9 +138,16 @@
  * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
  *   cannot handle allocation failures. New users should be evaluated carefully
  *   (and the flag should be used only when there is no reasonable failure
  *   policy) but it is definitely preferable to use the flag rather than
- *   opencode endless loop around allocator.
+ *   opencode endless loop around allocator. However, the VM implementation
+ *   is allowed to fail if order > PAGE_ALLOC_COSTLY_ORDER but __GFP_REPEAT
+ *   is not set. That is, all users which specify __GFP_NOFAIL with order
+ *   determined at runtime (e.g. sizeof(struct) * num_elements) had better
+ *   specify __GFP_REPEAT as well, for the VM implementation does not invoke
+ *   the OOM killer unless __GFP_REPEAT is also specified and the caller has
+ *   to be prepared for allocation failures using opencode endless loop
+ *   around allocator.
  *
  * __GFP_NORETRY: The VM implementation must not retry indefinitely and will
  *   return NULL when direct reclaim and memory compaction have failed to allow
  *   the allocation to succeed.  The OOM killer is not called with the current

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>