Re: How to handle TIF_MEMDIE stalls?

Michal Hocko <mhocko@xxxxxxx> · Thu, 19 Feb 2015 13:29:14 +0100

On Thu 19-02-15 06:01:24, Johannes Weiner wrote:
[...]
> Preferrably, we'd get rid of all nofail allocations and replace them
> with preallocated reserves.  But this is not going to happen anytime
> soon, so what other option do we have than resolving this on the OOM
> killer side?

As I've mentioned in other email, we might give GFP_NOFAIL allocator
access to memory reserves (by giving it __GFP_HIGH). This is still not a
100% solution because reserves could get depleted but this risk is there
even with multiple oom victims. I would still argue that this would be a
better approach because selecting more victims might hit pathological
case more easily (other victims might be blocked on the very same lock
e.g.).

Something like the following:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8d52ab18fe0d..4b5cf28a13f4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2599,6 +2599,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum migrate_mode migration_mode = MIGRATE_ASYNC;
 	bool deferred_compaction = false;
 	int contended_compaction = COMPACT_CONTENDED_NONE;
+	int oom = 0;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -2628,6 +2629,15 @@ retry:
 		wake_all_kswapds(order, ac);
 
 	/*
+	 * __GFP_NOFAIL allocations cannot fail but yet the current context
+	 * might be blocking resources needed by the OOM victim to terminate.
+	 * Allow the caller to dive into memory reserves to succeed the
+	 * allocation and break out from a potential deadlock.
+	 */
+	if (oom > 10 && (gfp_mask & __GFP_NOFAIL))
+		gfp_mask |= __GFP_HIGH;
+
+	/*
 	 * OK, we're below the kswapd watermark and have kicked background
 	 * reclaim. Now things get more complex, so set up alloc_flags according
 	 * to how we want to proceed.
@@ -2759,6 +2769,8 @@ retry:
 				goto got_pg;
 			if (!did_some_progress)
 				goto nopage;
+
+			oom++;
 		}
 		/* Wait for some write requests to complete then retry */
 		wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>