Re: [PATCH resend] mm: compaction: optimize proactive compaction deferrals

Khalid Aziz <khalid.aziz@xxxxxxxxxx> · Wed, 21 Jul 2021 16:35:05 -0600

On 7/21/21 6:13 AM, Charan Teja Reddy wrote:
Vlastimil Babka figured out that when fragmentation score didn't go down
across the proactive compaction i.e. when no progress is made, next wake
up for proactive compaction is deferred for 1 <<
COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
decrement 'proactive_defer' counter and goes sleep i.e. it is getting
woken to just decrement a counter. The same deferral time can also
achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
avoided thus also removes the need of 'proactive_defer' thread counter.

Link: https://lore.kernel.org/linux-fsdevel/88abfdb6-2c13-b5a6-5b46-742d12d1c910@xxxxxxx/
Signed-off-by: Charan Teja Reddy <charante@xxxxxxxxxxxxxx>


Reviewed-by: Khalid Aziz <khalid.aziz@xxxxxxxxxx>


---
  Changes in V1:
     o Removed the 'proactive_defer' thread counter by optimizing proactive
     o This is a resend as earlier it was clubbed with other changes posted
       at https://lore.kernel.org/patchwork/patch/1448789/	

  mm/compaction.c | 29 +++++++++++++++++++----------
  1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 621508e..db00dbf 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2885,7 +2885,8 @@ static int kcompactd(void *p)
  {
  	pg_data_t *pgdat = (pg_data_t *)p;
  	struct task_struct *tsk = current;
-	unsigned int proactive_defer = 0;
+	long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
+	long timeout = default_timeout;
  
  	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
  
@@ -2902,23 +2903,30 @@ static int kcompactd(void *p)
  
  		trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
  		if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
-			kcompactd_work_requested(pgdat),
-			msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
+			kcompactd_work_requested(pgdat), timeout)) {
  
  			psi_memstall_enter(&pflags);
  			kcompactd_do_work(pgdat);
  			psi_memstall_leave(&pflags);
+			/*
+			 * Reset the timeout value. The defer timeout by
+			 * proactive compaction can effectively lost
+			 * here but that is fine as the condition of the
+			 * zone changed substantionally and carrying on
+			 * with the previous defer is not useful.
+			 */
+			timeout = default_timeout;
  			continue;
  		}
  
-		/* kcompactd wait timeout */
+		/*
+		 * Start the proactive work with default timeout. Based
+		 * on the fragmentation score, this timeout is updated.
+		 */
+		timeout = default_timeout;
  		if (should_proactive_compact_node(pgdat)) {
  			unsigned int prev_score, score;
  
-			if (proactive_defer) {
-				proactive_defer--;
-				continue;
-			}
  			prev_score = fragmentation_score_node(pgdat);
  			proactive_compact_node(pgdat);
  			score = fragmentation_score_node(pgdat);
@@ -2926,8 +2934,9 @@ static int kcompactd(void *p)
  			 * Defer proactive compaction if the fragmentation
  			 * score did not go down i.e. no progress made.
  			 */
-			proactive_defer = score < prev_score ?
-					0 : 1 << COMPACT_MAX_DEFER_SHIFT;
+			if (unlikely(score >= prev_score))
+				timeout =
+				   default_timeout << COMPACT_MAX_DEFER_SHIFT;
  		}
  	}