Re: [Bug 204165] New: 100% CPU usage in compact_zone_order

Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> · Wed, 17 Jul 2019 18:53:32 +0100

On Tue, Jul 16, 2019 at 07:15:08PM +0000, howaboutsynergy@xxxxxxxxxxxxxx wrote:
> On Tuesday, July 16, 2019 12:03 PM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> > I tried reproducing this but after 300 attempts with various parameters
> > and adding other workloads in the background, I was unable to reproduce
> > the problem.
> > 
> 
> 
> The third time I ran this command `$ time stress -m 220 --vm-bytes 10000000000 --timeout 10`, got 10+ hung:
> 
>   PID  %CPU COMMAND                                                                            PR  NI    VIRT    RES S USER     
>  3785  94.5 stress                                                                             20   0 9769416      4 R user     
>  3777  87.3 stress                                                                             20   0 9769416      4 R user     
>  3923  85.5 stress                                                                             20   0 9769416      4 R user     
>  3937  85.5 stress                                                                             20   0 9769416      4 R user     
>  3943  81.8 stress                                                                             20   0 9769416      4 R user     
>  3885  80.0 stress                                                                             20   0 9769416      4 R user     
>  3970  80.0 stress                                                                             20   0 9769416      4 R user     
>
> <SNIP>
>
> trace.dat is 1.3G
> -rw-r--r--  1 root root 1326219264 16.07.2019 20:45 trace.dat
> 

Ok, great. From the trace, it was obvious that the scanner is making no
progress. I don't think zswap is involved as such but it *may* be making
it easier to trigger due to altering timing. At least, I see no reason
why zswap would materially affect the termination conditions.

>From the path and your trace, I think what *might* be happening is that
a fatal signal is pending which does not advance the scanner or look like
a proper abort. I think it ends up looping in compaction instead of dying
without either aborting or progressing the scanner.  It might explain why
stress-ng is hitting is as it is probably sending fatal signals on timeout
(I didn't check the source).

Can you try this (compile tested only) patch please? Note that the stress
test might still take time to exit normally if it's stuck in a swap
storm of some sort but I'm hoping the 100% compaction CPU usage goes away
at least.

diff --git a/mm/compaction.c b/mm/compaction.c
index 9e1b9acb116b..952dc2fb24e5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -842,13 +842,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/*
 		 * Periodically drop the lock (if held) regardless of its
-		 * contention, to give chance to IRQs. Abort async compaction
-		 * if contended.
+		 * contention, to give chance to IRQs. Abort completely if
+		 * a fatal signal is pending.
 		 */
 		if (!(low_pfn % SWAP_CLUSTER_MAX)
 		    && compact_unlock_should_abort(&pgdat->lru_lock,
-					    flags, &locked, cc))
-			break;
+					    flags, &locked, cc)) {
+			low_pfn = 0;
+			goto fatal_pending;
+		}
 
 		if (!pfn_valid_within(low_pfn))
 			goto isolate_fail;
@@ -1060,6 +1062,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
 						nr_scanned, nr_isolated);
 
+fatal_pending:
 	cc->total_migrate_scanned += nr_scanned;
 	if (nr_isolated)
 		count_compact_events(COMPACTISOLATED, nr_isolated);

-- 
Mel Gorman
SUSE Labs