Re: [PATCH v3] mm/compaction: fix set skip in fast_find_migrateblock

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 13 Jan 2023 18:13:59 +0100

On 1/11/23 15:21, Mel Gorman wrote:
> On Mon, Jan 09, 2023 at 08:43:40PM +0100, Vlastimil Babka wrote:
>> On 7/19/22 10:28, Mel Gorman wrote:
>> > On Fri, Jul 15, 2022 at 11:26:01PM +0800, Chuyi Zhou wrote:
>> >> You are right, we do need some machenism to ensure mark skipped after
>> >> scanned a !IS_ALIGNED block from fast_find. However, I think the following
>> >> code may not working. Because *skip_updated* has been reset:
>> >>     if (!skip_updated) {
>> >>         skip_updated = true;
>> >>         if (test_and_set_skip(cc, page, low_pfn))
>> >>             goto isolate_abort;
>> >>     }
>> >> Why not ignore skip_updated after scanned a block, just like this:
>> >> 
>> >> diff --git a/mm/compaction.c b/mm/compaction.c
>> >> index 962d05d1e187..1c388c45f127 100644
>> >> --- a/mm/compaction.c
>> >> +++ b/mm/compaction.c
>> >> @@ -1180,7 +1180,7 @@ isolate_migratepages_block(struct compact_control *cc,
>> >> unsigned long low_pfn,
>> >>          * rescanned twice in a row.
>> >>          */
>> >>         if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
>> >> -               if (valid_page && !skip_updated)
>> >> +               if (valid_page)
>> >>                         set_pageblock_skip(valid_page);
>> >>                 update_cached_migrate(cc, low_pfn);
>> >>         }
>> >> 
>> > 
>> > Because valid_page still needs to be set but ensuring valid_page is set
>> > is sufficient.
>> > 
>> > The pageblock will be marked for skip on the first locking of a lruvec
>> > 
>> >                         /* Try get exclusive access under lock */
>> >                         if (!skip_updated) {
>> >                                 skip_updated = true;
>> >                                 if (test_and_set_skip(cc, page, low_pfn))
>> >                                         goto isolate_abort;
>> >                         }
>> > 
>> > That will happen regardless of alignment.
>> 
>> I'm not sure about that as test_and_set_skip() has "if
>> (!pageblock_aligned(pfn)) return false". So I think it only takes for the
>> first pfn in pageblock to not be PageLRU and we don't try to get the
>> exclusive access for it, and the following pfn's will not be aligned, and
>> thus we never set_pageblock_skip() through this path. If we have nr_isolated
>> > 0 and not cc->rescan, we might not set_pageblock_skip() through the hunk
>> above neither.
>> 
> 
> True.
> 
>> I've been checking this area because we have some reports [1] for upstream
>> 6.1 causing some long loops in khugepaged pegging 100% CPU in compaction.
>> Tracepoint data suggests we keep (successfully) isolating migratepages over
>> and over through fast_find_migrateblock() (the pfn ranges are not linearly
>> increasing but there's a cycle probably as we rotate the freelist), but are
>> failing to migrate them (for some reason). Between 6.0 and 6.1 there's this
>> patch as commit 7efc3b726103 so I strongly suspect it's the root cause and
>> will be providing a kernel with revert to in the bug to confirm.
>> Consider this an early heads up :)
>> 
>> [1] https://bugzilla.suse.com/show_bug.cgi?id=1206848
> 
> The trace shows that there is a lot of rescanning between a small subset
> of pageblocks because the conditions for marking the block skip are not
> met. The scan is not reaching the end of the pageblock because enough
> pages were isolated, but none were migrated successfully. Eventually it
> circles back to the same block.
> 
> Pageblock skipping is tricky because we are trying to minimise both
> latency and excessive scanning. However, tracking exactly when a block
> is fully scanned requires an excessive amount of state.
> 
> One potential mitigating factor is to forcibly scan a pageblock when
> migrations fail even though it could be for transient reasons such as
> page writeback or page dirty. This will sometimes migrate too many pages
> (worst case 500) but pageblocks will be marked skip.
> 
> This is not tested at all
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ca1603524bbe..9242fcaeb09c 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2387,6 +2387,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  			cc->rescan = true;
>  		}
>  
> +retry_rescan:
>  		switch (isolate_migratepages(cc)) {
>  		case ISOLATE_ABORT:
>  			ret = COMPACT_CONTENDED;
> @@ -2429,15 +2430,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  				goto out;
>  			}
>  			/*
> -			 * We failed to migrate at least one page in the current
> -			 * order-aligned block, so skip the rest of it.
> +			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
> +			 * within the current order-aligned block, rescan the
> +			 * entire block. Once the whole block is scanned, it'll
> +			 * be marked "skip" to avoid rescanning in the near
> +			 * future. This will isolate more pages than necessary
> +			 * for the request but avoid loops due to
> +			 * fast_find_migrateblock revisiting blocks that were
> +			 * recently partially scanned.
>  			 */
> -			if (cc->direct_compaction &&
> -						(cc->mode == MIGRATE_ASYNC)) {
> -				cc->migrate_pfn = block_end_pfn(
> -						cc->migrate_pfn - 1, cc->order);
> -				/* Draining pcplists is useless in this case */
> -				last_migrated_pfn = 0;
> +			if (cc->direct_compaction && !cc->rescan &&
> +						(cc->mode < MIGRATE_SYNC)) {
> +				cc->rescan = true;
> +
> +				/*
> +				 * Draining pcplists does not help THP if
> +				 * any page failed to migrate. Even after
> +				 * drain, the pageblock will not be free.
> +				 */
> +				if (cc->order == HUGETLB_PAGE_ORDER)
> +					last_migrated_pfn = 0;
> +
> +				goto retry_rescan;

I think it won't work as expected, because we will not be rescanning the
same pageblock, fast_find_migrateblock() will AFAICS pick a different one,
there's nothing to prevent it.
But the possible compaction state is rather complex for me to be confident
about a quick fix. I think it will be safer to revert the patch for 6.2 and
6.1 stable, and redo it more carefully.

>  			}
>  		}
>