On 2/7/23 18:42, Vlastimil Babka wrote: > On 1/25/23 14:44, Mel Gorman wrote: >> Commit 7efc3b726103 ("mm/compaction: fix set skip in >> fast_find_migrateblock") address an issue where a pageblock selected >> by fast_find_migrateblock() was ignored. Unfortunately, the same fix >> resulted in numerous reports of khugepaged or kcompactd stalling for >> long periods of time or consuming 100% of CPU. >> >> Tracing showed that there was a lot of rescanning between a small subset >> of pageblocks because the conditions for marking the block skip are not >> met. The scan is not reaching the end of the pageblock because enough >> pages were isolated but none were migrated successfully. Eventually it >> circles back to the same block. >> >> Pageblock skip tracking tries to minimise both latency and excessive >> scanning but tracking exactly when a block is fully scanned requires an >> excessive amount of state. This patch forcibly rescans a pageblock when >> all isolated pages fail to migrate even though it could be for transient >> reasons such as page writeback or page dirty. This will sometimes migrate >> too many pages but pageblocks will be marked skip and forward progress >> will be made. >> >> "Usemen" from the mmtests configuration >> workload-usemem-stress-numa-compact was used to stress compaction. >> The compaction trace events were recorded using a 6.2-rc5 kernel that >> includes commit 7efc3b726103 and count of unique ranges were measured. The >> top 5 ranges were >> >> 3076 range=(0x10ca00-0x10cc00) >> 3076 range=(0x110a00-0x110c00) >> 3098 range=(0x13b600-0x13b800) >> 3104 range=(0x141c00-0x141e00) >> 11424 range=(0x11b600-0x11b800) >> >> While this workload is very different than what the bugs reported, >> the pattern of the same subset of blocks being repeatedly scanned is >> observed. At one point, *only* the range range=(0x11b600 ~ 0x11b800) >> was scanned for 2 seconds. 14 seconds passed between the first >> migration-related event and the last. >> >> With the series applied including this patch, the top 5 ranges were >> >> 1 range=(0x11607e-0x116200) >> 1 range=(0x116200-0x116278) >> 1 range=(0x116278-0x116400) >> 1 range=(0x116400-0x116424) >> 1 range=(0x116424-0x116600) >> >> Only unique ranges were scanned and the time between the first >> migration-related event was 0.11 milliseconds. >> >> Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock") >> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > > While this seems like it will mostly prevent the issue at hand (I think > kcompactd is still a hazard, see below), I'm not super happy about some of > the implementation details. For the record, after some discussion with Mel, my concerns are not a blocker and the series can proceed from mm-stable to 6.3. > 1. it modifies code that was meant to quickly skip an order-aligned block > where a page migration failed during MIGRATE_ASYNC direct compaction, as > it's very unlikely to sucessfully form a free page of that order in that > block. Now instead it will finish the whole pageblock in that case, which > could be lots of useless work and thus exactly the opposite of what we > wanted for MIGRATE_ASYNC direct compaction. There are both advantages and disadvantages to this so not clear win or lose. > 2. The conditions "cc->direct_compaction" and "(cc->mode < MIGRATE_SYNC)" > seem a bit hazardous. I think we can have a compaction where these are not > true, and yet it uses fast_find_migrateblock() and thus can exhibit the bug > but won't be forced to rescan? > AFAICS kcompactd_do_work() > - is MIGRATE_SYNC_LIGHT > - has ignore_skip_hint = false > - has direct_compaction = false > > so AFAICS it will use fast_find_migrateblock() and not bail out in one of > the preconditions. But the cc->direct_compaction condition here won't apply. This is only a concern once we attempt to un-revert 7efc3b726103 again so only necessary to be addressed as part of future series leading to the un-revert. > So it might be better to leave the current "skip the rest of block" check > alone, and add a separate check for the finish_pageblock rescan that will > not miss some cases where it should apply - maybe it could check for a > complete migration failure specifically as well? >> --- >> mm/compaction.c | 30 ++++++++++++++++++++++-------- >> 1 file changed, 22 insertions(+), 8 deletions(-) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 4b3a0238879c..937ec2f05f2c 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -2394,6 +2394,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) >> cc->finish_pageblock = true; >> } >> >> +rescan: >> switch (isolate_migratepages(cc)) { >> case ISOLATE_ABORT: >> ret = COMPACT_CONTENDED; >> @@ -2436,15 +2437,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) >> goto out; >> } >> /* >> - * We failed to migrate at least one page in the current >> - * order-aligned block, so skip the rest of it. >> + * If an ASYNC or SYNC_LIGHT fails to migrate a page >> + * within the current order-aligned block, scan the >> + * remainder of the pageblock. This will mark the >> + * pageblock "skip" to avoid rescanning in the near >> + * future. This will isolate more pages than necessary >> + * for the request but avoid loops due to >> + * fast_find_migrateblock revisiting blocks that were >> + * recently partially scanned. >> */ >> - if (cc->direct_compaction && >> - (cc->mode == MIGRATE_ASYNC)) { >> - cc->migrate_pfn = block_end_pfn( >> - cc->migrate_pfn - 1, cc->order); >> - /* Draining pcplists is useless in this case */ >> - last_migrated_pfn = 0; >> + if (cc->direct_compaction && !cc->finish_pageblock && >> + (cc->mode < MIGRATE_SYNC)) { >> + cc->finish_pageblock = true; >> + >> + /* >> + * Draining pcplists does not help THP if >> + * any page failed to migrate. Even after >> + * drain, the pageblock will not be free. >> + */ >> + if (cc->order == COMPACTION_HPAGE_ORDER) >> + last_migrated_pfn = 0; >> + >> + goto rescan; >> } >> } >> > >