+ mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Mon, 27 Aug 2018 14:54:02 -0700

The patch titled
     Subject: mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().
has been added to the -mm tree.  Its filename is
     mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Michal Hocko <mhocko@xxxxxxxx>
Subject: mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

Tetsuo Handa has reported that it is possible to bypass the short sleep
for PF_WQ_WORKER threads which was introduced by commit 373ccbe5927034b5
("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make
any progress") and lock up the system if OOM.

The primary reason is that WQ_MEM_RECLAIM WQs are not guaranteed to run
even when they have a rescuer available.  Those workers might be essential
for reclaim to make a forward progress, however.  If we are too unlucky
all the allocations requests can get stuck waiting for a WQ_MEM_RECLAIM
work item and the system is essentially stuck in an OOM condition without
much hope to move on.  Tetsuo has seen the reclaim stuck on
drain_local_pages_wq or xlog_cil_push_work (xfs).  There might be others.

Since should_reclaim_retry() should be a natural reschedule point, let's
do the short sleep for PF_WQ_WORKER threads unconditionally in order to
guarantee that other pending work items are started.  This will workaround
this problem and it is less fragile than hunting down when the sleep is
missed.  E.g.  we used to have a sleeping point in the oom path but this
has been removed recently because it caused other issues.  Having a single
sleeping point is more robust.

Link: http://lkml.kernel.org/r/20180827135101.15700-1-mhocko@xxxxxxxxxx
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
Debugged-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Roman Gushchin <guro@xxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Tejun Heo <tj@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/page_alloc.c |   34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

--- a/mm/page_alloc.c~mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry
+++ a/mm/page_alloc.c
@@ -3923,6 +3923,7 @@ should_reclaim_retry(gfp_t gfp_mask, uns
 {
 	struct zone *zone;
 	struct zoneref *z;
+	bool ret = false;
 
 	/*
 	 * Costly allocations might have made a progress but this doesn't mean
@@ -3986,25 +3987,26 @@ should_reclaim_retry(gfp_t gfp_mask, uns
 				}
 			}
 
-			/*
-			 * Memory allocation/reclaim might be called from a WQ
-			 * context and the current implementation of the WQ
-			 * concurrency control doesn't recognize that
-			 * a particular WQ is congested if the worker thread is
-			 * looping without ever sleeping. Therefore we have to
-			 * do a short sleep here rather than calling
-			 * cond_resched().
-			 */
-			if (current->flags & PF_WQ_WORKER)
-				schedule_timeout_uninterruptible(1);
-			else
-				cond_resched();
-
-			return true;
+			ret = true;
+			goto out;
 		}
 	}
 
-	return false;
+out:
+	/*
+	 * Memory allocation/reclaim might be called from a WQ
+	 * context and the current implementation of the WQ
+	 * concurrency control doesn't recognize that
+	 * a particular WQ is congested if the worker thread is
+	 * looping without ever sleeping. Therefore we have to
+	 * do a short sleep here rather than calling
+	 * cond_resched().
+	 */
+	if (current->flags & PF_WQ_WORKER)
+		schedule_timeout_uninterruptible(1);
+	else
+		cond_resched();
+	return ret;
 }
 
 static inline bool
_

Patches currently in -mm which might be from mhocko@xxxxxxxx are

mmpage_alloc-pf_wq_worker-threads-must-sleep-at-should_reclaim_retry.patch