On 1/11/23 15:21, Mel Gorman wrote: > On Mon, Jan 09, 2023 at 08:43:40PM +0100, Vlastimil Babka wrote: >> On 7/19/22 10:28, Mel Gorman wrote: >> > On Fri, Jul 15, 2022 at 11:26:01PM +0800, Chuyi Zhou wrote: >> >> You are right, we do need some machenism to ensure mark skipped after >> >> scanned a !IS_ALIGNED block from fast_find. However, I think the following >> >> code may not working. Because *skip_updated* has been reset: >> >> if (!skip_updated) { >> >> skip_updated = true; >> >> if (test_and_set_skip(cc, page, low_pfn)) >> >> goto isolate_abort; >> >> } >> >> Why not ignore skip_updated after scanned a block, just like this: >> >> >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> >> index 962d05d1e187..1c388c45f127 100644 >> >> --- a/mm/compaction.c >> >> +++ b/mm/compaction.c >> >> @@ -1180,7 +1180,7 @@ isolate_migratepages_block(struct compact_control *cc, >> >> unsigned long low_pfn, >> >> * rescanned twice in a row. >> >> */ >> >> if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) { >> >> - if (valid_page && !skip_updated) >> >> + if (valid_page) >> >> set_pageblock_skip(valid_page); >> >> update_cached_migrate(cc, low_pfn); >> >> } >> >> >> > >> > Because valid_page still needs to be set but ensuring valid_page is set >> > is sufficient. >> > >> > The pageblock will be marked for skip on the first locking of a lruvec >> > >> > /* Try get exclusive access under lock */ >> > if (!skip_updated) { >> > skip_updated = true; >> > if (test_and_set_skip(cc, page, low_pfn)) >> > goto isolate_abort; >> > } >> > >> > That will happen regardless of alignment. >> >> I'm not sure about that as test_and_set_skip() has "if >> (!pageblock_aligned(pfn)) return false". So I think it only takes for the >> first pfn in pageblock to not be PageLRU and we don't try to get the >> exclusive access for it, and the following pfn's will not be aligned, and >> thus we never set_pageblock_skip() through this path. If we have nr_isolated >> > 0 and not cc->rescan, we might not set_pageblock_skip() through the hunk >> above neither. >> > > True. > >> I've been checking this area because we have some reports [1] for upstream >> 6.1 causing some long loops in khugepaged pegging 100% CPU in compaction. >> Tracepoint data suggests we keep (successfully) isolating migratepages over >> and over through fast_find_migrateblock() (the pfn ranges are not linearly >> increasing but there's a cycle probably as we rotate the freelist), but are >> failing to migrate them (for some reason). Between 6.0 and 6.1 there's this >> patch as commit 7efc3b726103 so I strongly suspect it's the root cause and >> will be providing a kernel with revert to in the bug to confirm. >> Consider this an early heads up :) >> >> [1] https://bugzilla.suse.com/show_bug.cgi?id=1206848 > > The trace shows that there is a lot of rescanning between a small subset > of pageblocks because the conditions for marking the block skip are not > met. The scan is not reaching the end of the pageblock because enough > pages were isolated, but none were migrated successfully. Eventually it > circles back to the same block. > > Pageblock skipping is tricky because we are trying to minimise both > latency and excessive scanning. However, tracking exactly when a block > is fully scanned requires an excessive amount of state. > > One potential mitigating factor is to forcibly scan a pageblock when > migrations fail even though it could be for transient reasons such as > page writeback or page dirty. This will sometimes migrate too many pages > (worst case 500) but pageblocks will be marked skip. > > This is not tested at all > > diff --git a/mm/compaction.c b/mm/compaction.c > index ca1603524bbe..9242fcaeb09c 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2387,6 +2387,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > cc->rescan = true; > } > > +retry_rescan: > switch (isolate_migratepages(cc)) { > case ISOLATE_ABORT: > ret = COMPACT_CONTENDED; > @@ -2429,15 +2430,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > goto out; > } > /* > - * We failed to migrate at least one page in the current > - * order-aligned block, so skip the rest of it. > + * If an ASYNC or SYNC_LIGHT fails to migrate a page > + * within the current order-aligned block, rescan the > + * entire block. Once the whole block is scanned, it'll > + * be marked "skip" to avoid rescanning in the near > + * future. This will isolate more pages than necessary > + * for the request but avoid loops due to > + * fast_find_migrateblock revisiting blocks that were > + * recently partially scanned. > */ > - if (cc->direct_compaction && > - (cc->mode == MIGRATE_ASYNC)) { > - cc->migrate_pfn = block_end_pfn( > - cc->migrate_pfn - 1, cc->order); > - /* Draining pcplists is useless in this case */ > - last_migrated_pfn = 0; > + if (cc->direct_compaction && !cc->rescan && > + (cc->mode < MIGRATE_SYNC)) { > + cc->rescan = true; > + > + /* > + * Draining pcplists does not help THP if > + * any page failed to migrate. Even after > + * drain, the pageblock will not be free. > + */ > + if (cc->order == HUGETLB_PAGE_ORDER) > + last_migrated_pfn = 0; > + > + goto retry_rescan; I think it won't work as expected, because we will not be rescanning the same pageblock, fast_find_migrateblock() will AFAICS pick a different one, there's nothing to prevent it. But the possible compaction state is rather complex for me to be confident about a quick fix. I think it will be safer to revert the patch for 6.2 and 6.1 stable, and redo it more carefully. > } > } >